Data Exploration

library(tidyverse)
library(magrittr)
library(scales)
library(hexbin)
data <- read_csv("prosperLoanData.csv") %>% mutate_if(is.character, as.factor)

Data Cleaning

Above, I imported the character columns as factors, as having taken a closer look at the data, they are labels for categories, rather than strings (in the following analysis, I don’t find any disconfirmation of this). The first thing I will do now is take a closer look at the data, and see if other columns are formatted appropriately:

data[,1:7]
## # A tibble: 113,937 x 7
##    ListingKey          ListingNumber ListingCreationDate CreditGrade  Term
##    <fct>                       <int> <dttm>              <fct>       <int>
##  1 102133976686814541…        193129 2007-08-26 19:09:29 C              36
##  2 10273602499503308B…       1209647 2014-02-27 08:28:07 <NA>           36
##  3 0EE933782585103286…         81716 2007-01-05 15:00:47 HR             36
##  4 0EF535600248271529…        658116 2012-10-22 11:02:35 <NA>           36
##  5 0F023589499656230C…        909464 2013-09-14 18:38:39 <NA>           36
##  6 0F0535973482419938…       1074836 2013-12-14 08:26:37 <NA>           60
##  7 0F0A3576754255009D…        750899 2013-04-12 09:52:56 <NA>           36
##  8 0F1035772717087366…        768193 2013-05-05 06:49:27 <NA>           36
##  9 0F043596202561788E…       1023355 2013-12-02 10:43:39 <NA>           36
## 10 0F043596202561788E…       1023355 2013-12-02 10:43:39 <NA>           36
## # ... with 113,927 more rows, and 2 more variables: LoanStatus <fct>,
## #   ClosedDate <dttm>
str(data)
## Classes 'tbl_df', 'tbl' and 'data.frame':    113937 obs. of  81 variables:
##  $ ListingKey                         : Factor w/ 113066 levels "00003546482094282EF90E5",..: 7180 7193 6647 6669 6686 6689 6699 6706 6687 6687 ...
##  $ ListingNumber                      : int  193129 1209647 81716 658116 909464 1074836 750899 768193 1023355 1023355 ...
##  $ ListingCreationDate                : POSIXct, format: "2007-08-26 19:09:29" "2014-02-27 08:28:07" ...
##  $ CreditGrade                        : Factor w/ 8 levels "A","AA","B","C",..: 4 NA 7 NA NA NA NA NA NA NA ...
##  $ Term                               : int  36 36 36 36 36 60 36 36 36 36 ...
##  $ LoanStatus                         : Factor w/ 12 levels "Cancelled","Chargedoff",..: 3 4 3 4 4 4 4 4 4 4 ...
##  $ ClosedDate                         : POSIXct, format: "2009-08-14" NA ...
##  $ BorrowerAPR                        : num  0.165 0.12 0.283 0.125 0.246 ...
##  $ BorrowerRate                       : num  0.158 0.092 0.275 0.0974 0.2085 ...
##  $ LenderYield                        : num  0.138 0.082 0.24 0.0874 0.1985 ...
##  $ EstimatedEffectiveYield            : num  NA 0.0796 NA 0.0849 0.1832 ...
##  $ EstimatedLoss                      : num  NA 0.0249 NA 0.0249 0.0925 ...
##  $ EstimatedReturn                    : num  NA 0.0547 NA 0.06 0.0907 ...
##  $ ProsperRating (numeric)            : int  NA 6 NA 6 3 5 2 4 7 7 ...
##  $ ProsperRating (Alpha)              : Factor w/ 7 levels "A","AA","B","C",..: NA 1 NA 1 5 3 6 4 2 2 ...
##  $ ProsperScore                       : num  NA 7 NA 9 4 10 2 4 9 11 ...
##  $ ListingCategory (numeric)          : int  0 2 0 16 2 1 1 2 7 7 ...
##  $ BorrowerState                      : Factor w/ 51 levels "AK","AL","AR",..: 6 6 11 11 24 33 17 5 15 15 ...
##  $ Occupation                         : Factor w/ 67 levels "Accountant/CPA",..: 36 42 36 51 20 42 49 28 23 23 ...
##  $ EmploymentStatus                   : Factor w/ 8 levels "Employed","Full-time",..: 8 1 3 1 1 1 1 1 1 1 ...
##  $ EmploymentStatusDuration           : int  2 44 NA 113 44 82 172 103 269 269 ...
##  $ IsBorrowerHomeowner                : Factor w/ 2 levels "False","True": 2 1 1 2 2 2 1 1 2 2 ...
##  $ CurrentlyInGroup                   : Factor w/ 2 levels "False","True": 2 1 2 1 1 1 1 1 1 1 ...
##  $ GroupKey                           : Factor w/ 706 levels "00343376901312423168731",..: NA NA 334 NA NA NA NA NA NA NA ...
##  $ DateCreditPulled                   : POSIXct, format: "2007-08-26 18:41:46" "2014-02-27 08:28:14" ...
##  $ CreditScoreRangeLower              : int  640 680 480 800 680 740 680 700 820 820 ...
##  $ CreditScoreRangeUpper              : int  659 699 499 819 699 759 699 719 839 839 ...
##  $ FirstRecordedCreditLine            : POSIXct, format: "2001-10-11" "1996-03-18" ...
##  $ CurrentCreditLines                 : int  5 14 NA 5 19 21 10 6 17 17 ...
##  $ OpenCreditLines                    : int  4 14 NA 5 19 17 7 6 16 16 ...
##  $ TotalCreditLinespast7years         : int  12 29 3 29 49 49 20 10 32 32 ...
##  $ OpenRevolvingAccounts              : int  1 13 0 7 6 13 6 5 12 12 ...
##  $ OpenRevolvingMonthlyPayment        : num  24 389 0 115 220 1410 214 101 219 219 ...
##  $ InquiriesLast6Months               : int  3 3 0 0 1 0 0 3 1 1 ...
##  $ TotalInquiries                     : num  3 5 1 1 9 2 0 16 6 6 ...
##  $ CurrentDelinquencies               : int  2 0 1 4 0 0 0 0 0 0 ...
##  $ AmountDelinquent                   : num  472 0 NA 10056 0 ...
##  $ DelinquenciesLast7Years            : int  4 0 0 14 0 0 0 0 0 0 ...
##  $ PublicRecordsLast10Years           : int  0 1 0 0 0 0 0 1 0 0 ...
##  $ PublicRecordsLast12Months          : int  0 0 NA 0 0 0 0 0 0 0 ...
##  $ RevolvingCreditBalance             : num  0 3989 NA 1444 6193 ...
##  $ BankcardUtilization                : num  0 0.21 NA 0.04 0.81 0.39 0.72 0.13 0.11 0.11 ...
##  $ AvailableBankcardCredit            : num  1500 10266 NA 30754 695 ...
##  $ TotalTrades                        : num  11 29 NA 26 39 47 16 10 29 29 ...
##  $ TradesNeverDelinquent (percentage) : num  0.81 1 NA 0.76 0.95 1 0.68 0.8 1 1 ...
##  $ TradesOpenedLast6Months            : num  0 2 NA 0 2 0 0 0 1 1 ...
##  $ DebtToIncomeRatio                  : num  0.17 0.18 0.06 0.15 0.26 0.36 0.27 0.24 0.25 0.25 ...
##  $ IncomeRange                        : Factor w/ 8 levels "$0","$1-24,999",..: 4 5 7 4 3 3 4 4 4 4 ...
##  $ IncomeVerifiable                   : Factor w/ 2 levels "False","True": 2 2 2 2 2 2 2 2 2 2 ...
##  $ StatedMonthlyIncome                : num  3083 6125 2083 2875 9583 ...
##  $ LoanKey                            : Factor w/ 113066 levels "00003683605746079487FF7",..: 100337 69837 46303 70776 71387 86505 91250 5425 908 908 ...
##  $ TotalProsperLoans                  : int  NA NA NA NA 1 NA NA NA NA NA ...
##  $ TotalProsperPaymentsBilled         : int  NA NA NA NA 11 NA NA NA NA NA ...
##  $ OnTimeProsperPayments              : int  NA NA NA NA 11 NA NA NA NA NA ...
##  $ ProsperPaymentsLessThanOneMonthLate: int  NA NA NA NA 0 NA NA NA NA NA ...
##  $ ProsperPaymentsOneMonthPlusLate    : int  NA NA NA NA 0 NA NA NA NA NA ...
##  $ ProsperPrincipalBorrowed           : num  NA NA NA NA 11000 NA NA NA NA NA ...
##  $ ProsperPrincipalOutstanding        : num  NA NA NA NA 9948 ...
##  $ ScorexChangeAtTimeOfListing        : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ LoanCurrentDaysDelinquent          : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ LoanFirstDefaultedCycleNumber      : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ LoanMonthsSinceOrigination         : int  78 0 86 16 6 3 11 10 3 3 ...
##  $ LoanNumber                         : int  19141 134815 6466 77296 102670 123257 88353 90051 121268 121268 ...
##  $ LoanOriginalAmount                 : int  9425 10000 3001 10000 15000 15000 3000 10000 10000 10000 ...
##  $ LoanOriginationDate                : POSIXct, format: "2007-09-12" "2014-03-03" ...
##  $ LoanOriginationQuarter             : Factor w/ 33 levels "Q1 2006","Q1 2007",..: 18 8 2 32 24 33 16 16 33 33 ...
##  $ MemberKey                          : Factor w/ 90831 levels "00003397697413387CAF966",..: 11071 10302 33781 54939 19465 48037 60448 40951 26129 26129 ...
##  $ MonthlyLoanPayment                 : num  330 319 123 321 564 ...
##  $ LP_CustomerPayments                : num  11396 0 4187 5143 2820 ...
##  $ LP_CustomerPrincipalPayments       : num  9425 0 3001 4091 1563 ...
##  $ LP_InterestandFees                 : num  1971 0 1186 1052 1257 ...
##  $ LP_ServiceFees                     : num  -133.2 0 -24.2 -108 -60.3 ...
##  $ LP_CollectionFees                  : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ LP_GrossPrincipalLoss              : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ LP_NetPrincipalLoss                : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ LP_NonPrincipalRecoverypayments    : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ PercentFunded                      : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ Recommendations                    : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ InvestmentFromFriendsCount         : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ InvestmentFromFriendsAmount        : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Investors                          : int  258 1 41 158 20 1 1 1 1 1 ...

The first thing I notice is that there are several date columns which should be formatted as such, and several boolean (True/False) type columns. I also want to order the levels in some of the factor columns, as they are inherently ordered (CreditGrade, ProsperRating.alpha, IncomeRange, LoanOriginationQuarter). Several of the columns have spaces or special characters in the column names, which makes it difficult to refer to these columns - I will rename these.

data %<>% 
  mutate_at(c("ListingCreationDate","ClosedDate","DateCreditPulled","FirstRecordedCreditLine","LoanOriginationDate"), as.Date) %>%
  mutate_at(c("IsBorrowerHomeowner","CurrentlyInGroup","IncomeVerifiable"), as.logical) %>%
  rename_all(~sub(" (numeric)", ".num", ., fixed=TRUE)) %>%
  rename_all(~sub(" (Alpha)", ".alpha", ., fixed=TRUE)) %>%
  rename_all(~sub(" (percentage)", ".per", ., fixed=TRUE))

data$CreditGrade <- ordered(data$CreditGrade, c("NC","HR","E","D","C","B","A","AA"))
data$ProsperRating.alpha <- ordered(data$ProsperRating.alpha, c("NC","HR","E","D","C","B","A","AA"))
data$IncomeRange <- ordered(data$IncomeRange, c("Not displayed","Not employed","$0","$1-24,999","$25,000-49,999","$50,000-74,999","$75,000-99,999","$100,000+"))
data$LoanOriginationQuarter <- ordered(data$LoanOriginationQuarter, c("Q1 2006", "Q2 2006", "Q3 2006", "Q4 2006", "Q1 2007", "Q2 2007", "Q3 2007", "Q4 2007", "Q1 2008", "Q2 2008", "Q3 2008", "Q4 2008", "Q1 2009", "Q2 2009", "Q3 2009", "Q4 2009", "Q1 2010", "Q2 2010", "Q3 2010", "Q4 2010", "Q1 2011", "Q2 2011", "Q3 2011", "Q4 2011", "Q1 2012", "Q2 2012", "Q3 2012", "Q4 2012", "Q1 2013", "Q2 2013", "Q3 2013", "Q4 2013", "Q1 2014", "Q2 2014", "Q3 2014", "Q4 2014"))
str(data)
## Classes 'tbl_df', 'tbl' and 'data.frame':    113937 obs. of  81 variables:
##  $ ListingKey                         : Factor w/ 113066 levels "00003546482094282EF90E5",..: 7180 7193 6647 6669 6686 6689 6699 6706 6687 6687 ...
##  $ ListingNumber                      : int  193129 1209647 81716 658116 909464 1074836 750899 768193 1023355 1023355 ...
##  $ ListingCreationDate                : Date, format: "2007-08-26" "2014-02-27" ...
##  $ CreditGrade                        : Ord.factor w/ 8 levels "NC"<"HR"<"E"<..: 5 NA 2 NA NA NA NA NA NA NA ...
##  $ Term                               : int  36 36 36 36 36 60 36 36 36 36 ...
##  $ LoanStatus                         : Factor w/ 12 levels "Cancelled","Chargedoff",..: 3 4 3 4 4 4 4 4 4 4 ...
##  $ ClosedDate                         : Date, format: "2009-08-14" NA ...
##  $ BorrowerAPR                        : num  0.165 0.12 0.283 0.125 0.246 ...
##  $ BorrowerRate                       : num  0.158 0.092 0.275 0.0974 0.2085 ...
##  $ LenderYield                        : num  0.138 0.082 0.24 0.0874 0.1985 ...
##  $ EstimatedEffectiveYield            : num  NA 0.0796 NA 0.0849 0.1832 ...
##  $ EstimatedLoss                      : num  NA 0.0249 NA 0.0249 0.0925 ...
##  $ EstimatedReturn                    : num  NA 0.0547 NA 0.06 0.0907 ...
##  $ ProsperRating.num                  : int  NA 6 NA 6 3 5 2 4 7 7 ...
##  $ ProsperRating.alpha                : Ord.factor w/ 8 levels "NC"<"HR"<"E"<..: NA 7 NA 7 4 6 3 5 8 8 ...
##  $ ProsperScore                       : num  NA 7 NA 9 4 10 2 4 9 11 ...
##  $ ListingCategory.num                : int  0 2 0 16 2 1 1 2 7 7 ...
##  $ BorrowerState                      : Factor w/ 51 levels "AK","AL","AR",..: 6 6 11 11 24 33 17 5 15 15 ...
##  $ Occupation                         : Factor w/ 67 levels "Accountant/CPA",..: 36 42 36 51 20 42 49 28 23 23 ...
##  $ EmploymentStatus                   : Factor w/ 8 levels "Employed","Full-time",..: 8 1 3 1 1 1 1 1 1 1 ...
##  $ EmploymentStatusDuration           : int  2 44 NA 113 44 82 172 103 269 269 ...
##  $ IsBorrowerHomeowner                : logi  TRUE FALSE FALSE TRUE TRUE TRUE ...
##  $ CurrentlyInGroup                   : logi  TRUE FALSE TRUE FALSE FALSE FALSE ...
##  $ GroupKey                           : Factor w/ 706 levels "00343376901312423168731",..: NA NA 334 NA NA NA NA NA NA NA ...
##  $ DateCreditPulled                   : Date, format: "2007-08-26" "2014-02-27" ...
##  $ CreditScoreRangeLower              : int  640 680 480 800 680 740 680 700 820 820 ...
##  $ CreditScoreRangeUpper              : int  659 699 499 819 699 759 699 719 839 839 ...
##  $ FirstRecordedCreditLine            : Date, format: "2001-10-11" "1996-03-18" ...
##  $ CurrentCreditLines                 : int  5 14 NA 5 19 21 10 6 17 17 ...
##  $ OpenCreditLines                    : int  4 14 NA 5 19 17 7 6 16 16 ...
##  $ TotalCreditLinespast7years         : int  12 29 3 29 49 49 20 10 32 32 ...
##  $ OpenRevolvingAccounts              : int  1 13 0 7 6 13 6 5 12 12 ...
##  $ OpenRevolvingMonthlyPayment        : num  24 389 0 115 220 1410 214 101 219 219 ...
##  $ InquiriesLast6Months               : int  3 3 0 0 1 0 0 3 1 1 ...
##  $ TotalInquiries                     : num  3 5 1 1 9 2 0 16 6 6 ...
##  $ CurrentDelinquencies               : int  2 0 1 4 0 0 0 0 0 0 ...
##  $ AmountDelinquent                   : num  472 0 NA 10056 0 ...
##  $ DelinquenciesLast7Years            : int  4 0 0 14 0 0 0 0 0 0 ...
##  $ PublicRecordsLast10Years           : int  0 1 0 0 0 0 0 1 0 0 ...
##  $ PublicRecordsLast12Months          : int  0 0 NA 0 0 0 0 0 0 0 ...
##  $ RevolvingCreditBalance             : num  0 3989 NA 1444 6193 ...
##  $ BankcardUtilization                : num  0 0.21 NA 0.04 0.81 0.39 0.72 0.13 0.11 0.11 ...
##  $ AvailableBankcardCredit            : num  1500 10266 NA 30754 695 ...
##  $ TotalTrades                        : num  11 29 NA 26 39 47 16 10 29 29 ...
##  $ TradesNeverDelinquent.per          : num  0.81 1 NA 0.76 0.95 1 0.68 0.8 1 1 ...
##  $ TradesOpenedLast6Months            : num  0 2 NA 0 2 0 0 0 1 1 ...
##  $ DebtToIncomeRatio                  : num  0.17 0.18 0.06 0.15 0.26 0.36 0.27 0.24 0.25 0.25 ...
##  $ IncomeRange                        : Ord.factor w/ 8 levels "Not displayed"<..: 5 6 1 5 8 8 5 5 5 5 ...
##  $ IncomeVerifiable                   : logi  TRUE TRUE TRUE TRUE TRUE TRUE ...
##  $ StatedMonthlyIncome                : num  3083 6125 2083 2875 9583 ...
##  $ LoanKey                            : Factor w/ 113066 levels "00003683605746079487FF7",..: 100337 69837 46303 70776 71387 86505 91250 5425 908 908 ...
##  $ TotalProsperLoans                  : int  NA NA NA NA 1 NA NA NA NA NA ...
##  $ TotalProsperPaymentsBilled         : int  NA NA NA NA 11 NA NA NA NA NA ...
##  $ OnTimeProsperPayments              : int  NA NA NA NA 11 NA NA NA NA NA ...
##  $ ProsperPaymentsLessThanOneMonthLate: int  NA NA NA NA 0 NA NA NA NA NA ...
##  $ ProsperPaymentsOneMonthPlusLate    : int  NA NA NA NA 0 NA NA NA NA NA ...
##  $ ProsperPrincipalBorrowed           : num  NA NA NA NA 11000 NA NA NA NA NA ...
##  $ ProsperPrincipalOutstanding        : num  NA NA NA NA 9948 ...
##  $ ScorexChangeAtTimeOfListing        : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ LoanCurrentDaysDelinquent          : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ LoanFirstDefaultedCycleNumber      : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ LoanMonthsSinceOrigination         : int  78 0 86 16 6 3 11 10 3 3 ...
##  $ LoanNumber                         : int  19141 134815 6466 77296 102670 123257 88353 90051 121268 121268 ...
##  $ LoanOriginalAmount                 : int  9425 10000 3001 10000 15000 15000 3000 10000 10000 10000 ...
##  $ LoanOriginationDate                : Date, format: "2007-09-12" "2014-03-03" ...
##  $ LoanOriginationQuarter             : Ord.factor w/ 36 levels "Q1 2006"<"Q2 2006"<..: 7 33 5 28 31 32 30 30 32 32 ...
##  $ MemberKey                          : Factor w/ 90831 levels "00003397697413387CAF966",..: 11071 10302 33781 54939 19465 48037 60448 40951 26129 26129 ...
##  $ MonthlyLoanPayment                 : num  330 319 123 321 564 ...
##  $ LP_CustomerPayments                : num  11396 0 4187 5143 2820 ...
##  $ LP_CustomerPrincipalPayments       : num  9425 0 3001 4091 1563 ...
##  $ LP_InterestandFees                 : num  1971 0 1186 1052 1257 ...
##  $ LP_ServiceFees                     : num  -133.2 0 -24.2 -108 -60.3 ...
##  $ LP_CollectionFees                  : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ LP_GrossPrincipalLoss              : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ LP_NetPrincipalLoss                : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ LP_NonPrincipalRecoverypayments    : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ PercentFunded                      : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ Recommendations                    : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ InvestmentFromFriendsCount         : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ InvestmentFromFriendsAmount        : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Investors                          : int  258 1 41 158 20 1 1 1 1 1 ...

First Impressions

Now I want to take a look at a summary of the data, to try to figure out what might be going on:

summary(data)
##                    ListingKey     ListingNumber     ListingCreationDate 
##  17A93590655669644DB4C06:     6   Min.   :      4   Min.   :2005-11-09  
##  349D3587495831350F0F648:     4   1st Qu.: 400919   1st Qu.:2008-09-19  
##  47C1359638497431975670B:     4   Median : 600554   Median :2012-06-16  
##  8474358854651984137201C:     4   Mean   : 627886   Mean   :2011-07-08  
##  DE8535960513435199406CE:     4   3rd Qu.: 892634   3rd Qu.:2013-09-09  
##  04C13599434217079754AEE:     3   Max.   :1255725   Max.   :2014-03-10  
##  (Other)                :113912                                         
##   CreditGrade         Term                       LoanStatus   
##  C      : 5649   Min.   :12.00   Current              :56576  
##  D      : 5153   1st Qu.:36.00   Completed            :38074  
##  B      : 4389   Median :36.00   Chargedoff           :11992  
##  AA     : 3509   Mean   :40.83   Defaulted            : 5018  
##  HR     : 3508   3rd Qu.:36.00   Past Due (1-15 days) :  806  
##  (Other): 6745   Max.   :60.00   Past Due (31-60 days):  363  
##  NA's   :84984                   (Other)              : 1108  
##    ClosedDate          BorrowerAPR       BorrowerRate     LenderYield     
##  Min.   :2005-11-25   Min.   :0.00653   Min.   :0.0000   Min.   :-0.0100  
##  1st Qu.:2009-07-14   1st Qu.:0.15629   1st Qu.:0.1340   1st Qu.: 0.1242  
##  Median :2011-04-05   Median :0.20976   Median :0.1840   Median : 0.1730  
##  Mean   :2011-03-07   Mean   :0.21883   Mean   :0.1928   Mean   : 0.1827  
##  3rd Qu.:2013-01-30   3rd Qu.:0.28381   3rd Qu.:0.2500   3rd Qu.: 0.2400  
##  Max.   :2014-03-10   Max.   :0.51229   Max.   :0.4975   Max.   : 0.4925  
##  NA's   :58848        NA's   :25                                          
##  EstimatedEffectiveYield EstimatedLoss   EstimatedReturn 
##  Min.   :-0.183          Min.   :0.005   Min.   :-0.183  
##  1st Qu.: 0.116          1st Qu.:0.042   1st Qu.: 0.074  
##  Median : 0.162          Median :0.072   Median : 0.092  
##  Mean   : 0.169          Mean   :0.080   Mean   : 0.096  
##  3rd Qu.: 0.224          3rd Qu.:0.112   3rd Qu.: 0.117  
##  Max.   : 0.320          Max.   :0.366   Max.   : 0.284  
##  NA's   :29084           NA's   :29084   NA's   :29084   
##  ProsperRating.num ProsperRating.alpha  ProsperScore   ListingCategory.num
##  Min.   :1.000     C      :18345       Min.   : 1.00   Min.   : 0.000     
##  1st Qu.:3.000     B      :15581       1st Qu.: 4.00   1st Qu.: 1.000     
##  Median :4.000     A      :14551       Median : 6.00   Median : 1.000     
##  Mean   :4.072     D      :14274       Mean   : 5.95   Mean   : 2.774     
##  3rd Qu.:5.000     E      : 9795       3rd Qu.: 8.00   3rd Qu.: 3.000     
##  Max.   :7.000     (Other):12307       Max.   :11.00   Max.   :20.000     
##  NA's   :29084     NA's   :29084       NA's   :29084                      
##  BorrowerState                 Occupation         EmploymentStatus
##  CA     :14717   Other              :28617   Employed     :67322  
##  TX     : 6842   Professional       :13628   Full-time    :26355  
##  NY     : 6729   Computer Programmer: 4478   Self-employed: 6134  
##  FL     : 6720   Executive          : 4311   Not available: 5347  
##  IL     : 5921   Teacher            : 3759   Other        : 3806  
##  (Other):67493   (Other)            :55556   (Other)      : 2718  
##  NA's   : 5515   NA's               : 3588   NA's         : 2255  
##  EmploymentStatusDuration IsBorrowerHomeowner CurrentlyInGroup
##  Min.   :  0.00           Mode :logical       Mode :logical   
##  1st Qu.: 26.00           FALSE:56459         FALSE:101218    
##  Median : 67.00           TRUE :57478         TRUE :12719     
##  Mean   : 96.07                                               
##  3rd Qu.:137.00                                               
##  Max.   :755.00                                               
##  NA's   :7625                                                 
##                     GroupKey      DateCreditPulled    
##  783C3371218786870A73D20:  1140   Min.   :2005-11-09  
##  3D4D3366260257624AB272D:   916   1st Qu.:2008-09-16  
##  6A3B336601725506917317E:   698   Median :2012-06-17  
##  FEF83377364176536637E50:   611   Mean   :2011-07-09  
##  C9643379247860156A00EC0:   342   3rd Qu.:2013-09-11  
##  (Other)                :  9634   Max.   :2014-03-10  
##  NA's                   :100596                       
##  CreditScoreRangeLower CreditScoreRangeUpper FirstRecordedCreditLine
##  Min.   :  0.0         Min.   : 19.0         Min.   :1947-08-24     
##  1st Qu.:660.0         1st Qu.:679.0         1st Qu.:1990-06-01     
##  Median :680.0         Median :699.0         Median :1995-11-01     
##  Mean   :685.6         Mean   :704.6         Mean   :1994-11-17     
##  3rd Qu.:720.0         3rd Qu.:739.0         3rd Qu.:2000-03-14     
##  Max.   :880.0         Max.   :899.0         Max.   :2012-12-22     
##  NA's   :591           NA's   :591           NA's   :697            
##  CurrentCreditLines OpenCreditLines TotalCreditLinespast7years
##  Min.   : 0.00      Min.   : 0.00   Min.   :  2.00            
##  1st Qu.: 7.00      1st Qu.: 6.00   1st Qu.: 17.00            
##  Median :10.00      Median : 9.00   Median : 25.00            
##  Mean   :10.32      Mean   : 9.26   Mean   : 26.75            
##  3rd Qu.:13.00      3rd Qu.:12.00   3rd Qu.: 35.00            
##  Max.   :59.00      Max.   :54.00   Max.   :136.00            
##  NA's   :7604       NA's   :7604    NA's   :697               
##  OpenRevolvingAccounts OpenRevolvingMonthlyPayment InquiriesLast6Months
##  Min.   : 0.00         Min.   :    0.0             Min.   :  0.000     
##  1st Qu.: 4.00         1st Qu.:  114.0             1st Qu.:  0.000     
##  Median : 6.00         Median :  271.0             Median :  1.000     
##  Mean   : 6.97         Mean   :  398.3             Mean   :  1.435     
##  3rd Qu.: 9.00         3rd Qu.:  525.0             3rd Qu.:  2.000     
##  Max.   :51.00         Max.   :14985.0             Max.   :105.000     
##                                                    NA's   :697         
##  TotalInquiries    CurrentDelinquencies AmountDelinquent  
##  Min.   :  0.000   Min.   : 0.0000      Min.   :     0.0  
##  1st Qu.:  2.000   1st Qu.: 0.0000      1st Qu.:     0.0  
##  Median :  4.000   Median : 0.0000      Median :     0.0  
##  Mean   :  5.584   Mean   : 0.5921      Mean   :   984.5  
##  3rd Qu.:  7.000   3rd Qu.: 0.0000      3rd Qu.:     0.0  
##  Max.   :379.000   Max.   :83.0000      Max.   :463881.0  
##  NA's   :1159      NA's   :697          NA's   :7622      
##  DelinquenciesLast7Years PublicRecordsLast10Years
##  Min.   : 0.000          Min.   : 0.0000         
##  1st Qu.: 0.000          1st Qu.: 0.0000         
##  Median : 0.000          Median : 0.0000         
##  Mean   : 4.155          Mean   : 0.3126         
##  3rd Qu.: 3.000          3rd Qu.: 0.0000         
##  Max.   :99.000          Max.   :38.0000         
##  NA's   :990             NA's   :697             
##  PublicRecordsLast12Months RevolvingCreditBalance BankcardUtilization
##  Min.   : 0.000            Min.   :      0        Min.   :0.000      
##  1st Qu.: 0.000            1st Qu.:   3121        1st Qu.:0.310      
##  Median : 0.000            Median :   8549        Median :0.600      
##  Mean   : 0.015            Mean   :  17599        Mean   :0.561      
##  3rd Qu.: 0.000            3rd Qu.:  19521        3rd Qu.:0.840      
##  Max.   :20.000            Max.   :1435667        Max.   :5.950      
##  NA's   :7604              NA's   :7604           NA's   :7604       
##  AvailableBankcardCredit  TotalTrades     TradesNeverDelinquent.per
##  Min.   :     0          Min.   :  0.00   Min.   :0.000            
##  1st Qu.:   880          1st Qu.: 15.00   1st Qu.:0.820            
##  Median :  4100          Median : 22.00   Median :0.940            
##  Mean   : 11210          Mean   : 23.23   Mean   :0.886            
##  3rd Qu.: 13180          3rd Qu.: 30.00   3rd Qu.:1.000            
##  Max.   :646285          Max.   :126.00   Max.   :1.000            
##  NA's   :7544            NA's   :7544     NA's   :7544             
##  TradesOpenedLast6Months DebtToIncomeRatio         IncomeRange   
##  Min.   : 0.000          Min.   : 0.000    $25,000-49,999:32192  
##  1st Qu.: 0.000          1st Qu.: 0.140    $50,000-74,999:31050  
##  Median : 0.000          Median : 0.220    $100,000+     :17337  
##  Mean   : 0.802          Mean   : 0.276    $75,000-99,999:16916  
##  3rd Qu.: 1.000          3rd Qu.: 0.320    Not displayed : 7741  
##  Max.   :20.000          Max.   :10.010    $1-24,999     : 7274  
##  NA's   :7544            NA's   :8554      (Other)       : 1427  
##  IncomeVerifiable StatedMonthlyIncome                    LoanKey      
##  Mode :logical    Min.   :      0     CB1B37030986463208432A1:     6  
##  FALSE:8669       1st Qu.:   3200     2DEE3698211017519D7333F:     4  
##  TRUE :105268     Median :   4667     9F4B37043517554537C364C:     4  
##                   Mean   :   5608     D895370150591392337ED6D:     4  
##                   3rd Qu.:   6825     E6FB37073953690388BC56D:     4  
##                   Max.   :1750003     0D8F37036734373301ED419:     3  
##                                       (Other)                :113912  
##  TotalProsperLoans TotalProsperPaymentsBilled OnTimeProsperPayments
##  Min.   :0.00      Min.   :  0.00             Min.   :  0.00       
##  1st Qu.:1.00      1st Qu.:  9.00             1st Qu.:  9.00       
##  Median :1.00      Median : 16.00             Median : 15.00       
##  Mean   :1.42      Mean   : 22.93             Mean   : 22.27       
##  3rd Qu.:2.00      3rd Qu.: 33.00             3rd Qu.: 32.00       
##  Max.   :8.00      Max.   :141.00             Max.   :141.00       
##  NA's   :91852     NA's   :91852              NA's   :91852        
##  ProsperPaymentsLessThanOneMonthLate ProsperPaymentsOneMonthPlusLate
##  Min.   : 0.00                       Min.   : 0.00                  
##  1st Qu.: 0.00                       1st Qu.: 0.00                  
##  Median : 0.00                       Median : 0.00                  
##  Mean   : 0.61                       Mean   : 0.05                  
##  3rd Qu.: 0.00                       3rd Qu.: 0.00                  
##  Max.   :42.00                       Max.   :21.00                  
##  NA's   :91852                       NA's   :91852                  
##  ProsperPrincipalBorrowed ProsperPrincipalOutstanding
##  Min.   :    0            Min.   :    0              
##  1st Qu.: 3500            1st Qu.:    0              
##  Median : 6000            Median : 1627              
##  Mean   : 8472            Mean   : 2930              
##  3rd Qu.:11000            3rd Qu.: 4127              
##  Max.   :72499            Max.   :23451              
##  NA's   :91852            NA's   :91852              
##  ScorexChangeAtTimeOfListing LoanCurrentDaysDelinquent
##  Min.   :-209.00             Min.   :   0.0           
##  1st Qu.: -35.00             1st Qu.:   0.0           
##  Median :  -3.00             Median :   0.0           
##  Mean   :  -3.22             Mean   : 152.8           
##  3rd Qu.:  25.00             3rd Qu.:   0.0           
##  Max.   : 286.00             Max.   :2704.0           
##  NA's   :95009                                        
##  LoanFirstDefaultedCycleNumber LoanMonthsSinceOrigination   LoanNumber    
##  Min.   : 0.00                 Min.   :  0.0              Min.   :     1  
##  1st Qu.: 9.00                 1st Qu.:  6.0              1st Qu.: 37332  
##  Median :14.00                 Median : 21.0              Median : 68599  
##  Mean   :16.27                 Mean   : 31.9              Mean   : 69444  
##  3rd Qu.:22.00                 3rd Qu.: 65.0              3rd Qu.:101901  
##  Max.   :44.00                 Max.   :100.0              Max.   :136486  
##  NA's   :96985                                                            
##  LoanOriginalAmount LoanOriginationDate  LoanOriginationQuarter
##  Min.   : 1000      Min.   :2005-11-15   Q4 2013:14450         
##  1st Qu.: 4000      1st Qu.:2008-10-02   Q1 2014:12172         
##  Median : 6500      Median :2012-06-26   Q3 2013: 9180         
##  Mean   : 8337      Mean   :2011-07-21   Q2 2013: 7099         
##  3rd Qu.:12000      3rd Qu.:2013-09-18   Q3 2012: 5632         
##  Max.   :35000      Max.   :2014-03-12   (Other):65382         
##                                          NA's   :   22         
##                    MemberKey      MonthlyLoanPayment LP_CustomerPayments
##  63CA34120866140639431C9:     9   Min.   :   0.0     Min.   :   -2.35   
##  16083364744933457E57FB9:     8   1st Qu.: 131.6     1st Qu.: 1005.76   
##  3A2F3380477699707C81385:     8   Median : 217.7     Median : 2583.83   
##  4D9C3403302047712AD0CDD:     8   Mean   : 272.5     Mean   : 4183.08   
##  739C338135235294782AE75:     8   3rd Qu.: 371.6     3rd Qu.: 5548.40   
##  7E1733653050264822FAA3D:     8   Max.   :2251.5     Max.   :40702.39   
##  (Other)                :113888                                         
##  LP_CustomerPrincipalPayments LP_InterestandFees LP_ServiceFees   
##  Min.   :    0.0              Min.   :   -2.35   Min.   :-664.87  
##  1st Qu.:  500.9              1st Qu.:  274.87   1st Qu.: -73.18  
##  Median : 1587.5              Median :  700.84   Median : -34.44  
##  Mean   : 3105.5              Mean   : 1077.54   Mean   : -54.73  
##  3rd Qu.: 4000.0              3rd Qu.: 1458.54   3rd Qu.: -13.92  
##  Max.   :35000.0              Max.   :15617.03   Max.   :  32.06  
##                                                                   
##  LP_CollectionFees  LP_GrossPrincipalLoss LP_NetPrincipalLoss
##  Min.   :-9274.75   Min.   :  -94.2       Min.   : -954.5    
##  1st Qu.:    0.00   1st Qu.:    0.0       1st Qu.:    0.0    
##  Median :    0.00   Median :    0.0       Median :    0.0    
##  Mean   :  -14.24   Mean   :  700.4       Mean   :  681.4    
##  3rd Qu.:    0.00   3rd Qu.:    0.0       3rd Qu.:    0.0    
##  Max.   :    0.00   Max.   :25000.0       Max.   :25000.0    
##                                                              
##  LP_NonPrincipalRecoverypayments PercentFunded    Recommendations   
##  Min.   :    0.00                Min.   :0.7000   Min.   : 0.00000  
##  1st Qu.:    0.00                1st Qu.:1.0000   1st Qu.: 0.00000  
##  Median :    0.00                Median :1.0000   Median : 0.00000  
##  Mean   :   25.14                Mean   :0.9986   Mean   : 0.04803  
##  3rd Qu.:    0.00                3rd Qu.:1.0000   3rd Qu.: 0.00000  
##  Max.   :21117.90                Max.   :1.0125   Max.   :39.00000  
##                                                                     
##  InvestmentFromFriendsCount InvestmentFromFriendsAmount   Investors      
##  Min.   : 0.00000           Min.   :    0.00            Min.   :   1.00  
##  1st Qu.: 0.00000           1st Qu.:    0.00            1st Qu.:   2.00  
##  Median : 0.00000           Median :    0.00            Median :  44.00  
##  Mean   : 0.02346           Mean   :   16.55            Mean   :  80.48  
##  3rd Qu.: 0.00000           3rd Qu.:    0.00            3rd Qu.: 115.00  
##  Max.   :33.00000           Max.   :25000.00            Max.   :1189.00  
## 

I first see that there’s a lot of missing data in many of the columns - it’s not clear to me immediately whether this indicates that the data for those rows is truly missing (but theoretically could have been gathered), or if the information in those columns was simply not applicable to those rows. I will sort this out as I move through the data, but I want to see if some information is, for example, only entered once the loan has been closed or completed. First, though, I will identify the factors of interest.

Factors of Interest

Prosper Loans, through cursory research (https://en.wikipedia.org/wiki/Prosper_Marketplace), appears to be a peer-to-peer lending company. The primary concern of companies is profit, and in this case, as I see no obvious measure of profit to the company itself, I will focus on profit to the lender (the lenders, presumably, keep the company in business). Of course, borrowers likewise keep the company in business, and given the measures collected, it’s possible to at least take a look at how borrower demographics influence loan funding. Variable names are cross-referenced with a document linked from the Kaggle site: https://docs.google.com/spreadsheets/d/1gDyi_L4UvIrLTEC6Wri5nbaMmkGmLQBk-Yx3z0XDEtI/edit#gid=0.

The factors of most interest to lenders, I assume, might be (for example) LoanStatus (whether a loan is in good standing, repaid, or written off, etc.), LenderYield (yield minus servicing fee), EstimatedEffectiveYield (yield minus servicing fee and uncollected interest, and plus late fees) - likely more informative than the preceding, EstimatedReturn (), EstimatedLoss (loss on charge-offs), LoanCurrentDaysDelinquent, LP_GrossPrincipalLoss, and LP_NetPrincipalLoss. These seem most indicative of how much lenders might profit, or lose, from any particular borrower. What the lender should care most about, overall, is the ability to predict whether (or to what degree) a given (current or future) loan will pay off. In some cases, it is unclear from the documentation whether these are predictions assigned by Prosper at the outset, or descriptions of what actually happened during the course of loans. Exploring the data might shed some light on this.

On the other hand, the factors I intuitively expect might be predictive of profit are the following (for example): CreditGrade (credit assigned when the listing went live), ProsperRating (rating assigned when the loan went live), ProsperScore (risk score), EstimatedReturn (predicted difference between estimated effective yield and estimated loss), ListingCategory (what the loan is for), Occupation, EmploymentStatus, EmploymentStatusDuration, IsBorrowerHomeowner, CreditScoreRangeLower/CreditScoreRangeUpper, FirstRecordedCreditLine, CurrentCreditLines, OpenCreditLines, TotalCreditLinespast7years, OpenRevolvingAccounts, OpenRevolvingMonthlyPayment, InquiriesLast6Months, TotalInquiries, CurrentDelinquencies, AmountDelinquent, DelinquenciesLast7Years, PublicRecordsLast10Years, PublicRecordsLast12Months, RevolvingCreditBalance, BankcardUtilization, AvailableBankcardCredit, TotalTrades (number of trade lines ever opened), TradesNeverDelinquent, TradesOpenedLast6Months, DebtToIncomeRatio, IncomeRange, IncomeVerifiable, StatedMonthlyIncome, TotalProsperLoans (prior Prosper loans), TotalProsperPaymentsBilled (presumably, number of payments billed at time of listing), OnTimeProsperPayments (number of on-time payments at time of listing), ProsperPaymentsLessThanOneMonthLate, ProsperPaymentsOneMonthPlusLate, ProsperPrincipalBorrowed (amount borrowed at time of listing), ProsperPrincipalOutstanding (amount outstanding at time of listing), Recommendations (number of recommendations at time of listing), InvestmentFromFriendsCount (number of friends investing), andInvestmentFromFriendsAmount (amount invested by friends), and Investors (total number of investors). There are too many of these categories, and I expect to narrow the list I will look at down to a few, particularly when multiple measures reflect more-or-less the same thing, or don’t show any distinct patterns of correlating with other variables.

With respect to loan funding, some of the same predictors likely also influence loan amounts and borrower funding, as most likely reflected by BorrowerAPR, BorrowerRate, LoanOriginalAmount, MonthlyLoanPayment, Term (the length of the loan), and PercentFunded (although this is likely to not be informative for recently created loans).

The borrowers and loans are primary indexed through the variables MemberKey and LoanNumber. Additional variables for keeping track of loans include LoanOriginationDate and LoanOriginationQuarter. ClosedDate is useful for quickly indexing loans which have been closed, and for which firm conclusions can be drawn as to how much lenders profited.

NA Values

Here I want to double-check why information might be missing (e.g., whether some variables are assigned only once a loan has been closed).

closed <- round(colMeans(is.na(filter(data, !is.na(ClosedDate))))*100,2)
not_closed <- round(colMeans(is.na(filter(data, is.na(ClosedDate))))*100,2)
data.frame(closed, not_closed)
##                                     closed not_closed
## ListingKey                            0.00       0.00
## ListingNumber                         0.00       0.00
## ListingCreationDate                   0.00       0.00
## CreditGrade                          47.44     100.00
## Term                                  0.00       0.00
## LoanStatus                            0.00       0.00
## ClosedDate                            0.00     100.00
## BorrowerAPR                           0.05       0.00
## BorrowerRate                          0.00       0.00
## LenderYield                           0.00       0.00
## EstimatedEffectiveYield              52.79       0.00
## EstimatedLoss                        52.79       0.00
## EstimatedReturn                      52.79       0.00
## ProsperRating.num                    52.79       0.00
## ProsperRating.alpha                  52.79       0.00
## ProsperScore                         52.79       0.00
## ListingCategory.num                   0.00       0.00
## BorrowerState                        10.01       0.00
## Occupation                            4.12       2.24
## EmploymentStatus                      4.09       0.00
## EmploymentStatusDuration             13.82       0.02
## IsBorrowerHomeowner                   0.00       0.00
## CurrentlyInGroup                      0.00       0.00
## GroupKey                             77.00      98.86
## DateCreditPulled                      0.00       0.00
## CreditScoreRangeLower                 1.07       0.00
## CreditScoreRangeUpper                 1.07       0.00
## FirstRecordedCreditLine               1.27       0.00
## CurrentCreditLines                   13.80       0.00
## OpenCreditLines                      13.80       0.00
## TotalCreditLinespast7years            1.27       0.00
## OpenRevolvingAccounts                 0.00       0.00
## OpenRevolvingMonthlyPayment           0.00       0.00
## InquiriesLast6Months                  1.27       0.00
## TotalInquiries                        2.10       0.00
## CurrentDelinquencies                  1.27       0.00
## AmountDelinquent                     13.84       0.00
## DelinquenciesLast7Years               1.80       0.00
## PublicRecordsLast10Years              1.27       0.00
## PublicRecordsLast12Months            13.80       0.00
## RevolvingCreditBalance               13.80       0.00
## BankcardUtilization                  13.80       0.00
## AvailableBankcardCredit              13.69       0.00
## TotalTrades                          13.69       0.00
## TradesNeverDelinquent.per            13.69       0.00
## TradesOpenedLast6Months              13.69       0.00
## DebtToIncomeRatio                     7.68       7.35
## IncomeRange                           0.00       0.00
## IncomeVerifiable                      0.00       0.00
## StatedMonthlyIncome                   0.00       0.00
## LoanKey                               0.00       0.00
## TotalProsperLoans                    80.87      80.38
## TotalProsperPaymentsBilled           80.87      80.38
## OnTimeProsperPayments                80.87      80.38
## ProsperPaymentsLessThanOneMonthLate  80.87      80.38
## ProsperPaymentsOneMonthPlusLate      80.87      80.38
## ProsperPrincipalBorrowed             80.87      80.38
## ProsperPrincipalOutstanding          80.87      80.38
## ScorexChangeAtTimeOfListing          81.05      85.58
## LoanCurrentDaysDelinquent             0.00       0.00
## LoanFirstDefaultedCycleNumber        69.24      99.99
## LoanMonthsSinceOrigination            0.00       0.00
## LoanNumber                            0.00       0.00
## LoanOriginalAmount                    0.00       0.00
## LoanOriginationDate                   0.00       0.00
## LoanOriginationQuarter                0.04       0.00
## MemberKey                             0.00       0.00
## MonthlyLoanPayment                    0.00       0.00
## LP_CustomerPayments                   0.00       0.00
## LP_CustomerPrincipalPayments          0.00       0.00
## LP_InterestandFees                    0.00       0.00
## LP_ServiceFees                        0.00       0.00
## LP_CollectionFees                     0.00       0.00
## LP_GrossPrincipalLoss                 0.00       0.00
## LP_NetPrincipalLoss                   0.00       0.00
## LP_NonPrincipalRecoverypayments       0.00       0.00
## PercentFunded                         0.00       0.00
## Recommendations                       0.00       0.00
## InvestmentFromFriendsCount            0.00       0.00
## InvestmentFromFriendsAmount           0.00       0.00
## Investors                             0.00       0.00

The first thing I notice is that whether a loan is closed, or not, is quite, but in most cases not entirely, predictive of whether missing values are present, or not.

None of the open loans have a credit grade, while about half of the closed loans do. I assume that those which do are post-July 2009 loans, which were never assigned a credit grade.

summary(filter(data, !is.na(ClosedDate) & is.na(CreditGrade)))
##                    ListingKey    ListingNumber     ListingCreationDate 
##  018A360063948152589C8BE:    2   Min.   : 149172   Min.   :2007-06-08  
##  30F435938764424435A1188:    2   1st Qu.: 479472   1st Qu.:2010-10-12  
##  32943590099161153292459:    2   Median : 529900   Median :2011-09-28  
##  6DFC3591891372387BB41B2:    2   Mean   : 554859   Mean   :2011-08-17  
##  778D35919242972923313E0:    2   3rd Qu.: 600118   3rd Qu.:2012-06-14  
##  82FD35914405776692938D4:    2   Max.   :1204824   Max.   :2014-02-13  
##  (Other)                :26124                                         
##   CreditGrade         Term                        LoanStatus   
##  NC     :    0   Min.   :12.00   Completed             :19786  
##  HR     :    0   1st Qu.:36.00   Chargedoff            : 5342  
##  E      :    0   Median :36.00   Defaulted             : 1008  
##  D      :    0   Mean   :37.99   Cancelled             :    0  
##  C      :    0   3rd Qu.:36.00   Current               :    0  
##  (Other):    0   Max.   :60.00   FinalPaymentInProgress:    0  
##  NA's   :26136                   (Other)               :    0  
##    ClosedDate          BorrowerAPR       BorrowerRate     LenderYield    
##  Min.   :2009-08-27   Min.   :0.04583   Min.   :0.0400   Min.   :0.0300  
##  1st Qu.:2012-06-12   1st Qu.:0.17359   1st Qu.:0.1469   1st Qu.:0.1369  
##  Median :2013-02-20   Median :0.26798   Median :0.2300   Median :0.2200  
##  Mean   :2012-12-20   Mean   :0.25118   Mean   :0.2193   Mean   :0.2093  
##  3rd Qu.:2013-09-10   3rd Qu.:0.33553   3rd Qu.:0.2958   3rd Qu.:0.2858  
##  Max.   :2014-03-10   Max.   :0.42395   Max.   :0.3600   Max.   :0.3400  
##                                                                          
##  EstimatedEffectiveYield EstimatedLoss     EstimatedReturn  
##  Min.   :-0.1827         Min.   :0.00490   Min.   :-0.1827  
##  1st Qu.: 0.1106         1st Qu.:0.05200   1st Qu.: 0.0780  
##  Median : 0.1715         Median :0.09800   Median : 0.1144  
##  Mean   : 0.1762         Mean   :0.09379   Mean   : 0.1075  
##  3rd Qu.: 0.2469         3rd Qu.:0.14050   3rd Qu.: 0.1363  
##  Max.   : 0.3199         Max.   :0.36600   Max.   : 0.2837  
##  NA's   :131             NA's   :131       NA's   :131      
##  ProsperRating.num ProsperRating.alpha  ProsperScore   
##  Min.   :1.000     D      :5869        Min.   : 1.000  
##  1st Qu.:2.000     E      :3830        1st Qu.: 5.000  
##  Median :3.000     C      :3817        Median : 6.000  
##  Mean   :3.663     HR     :3725        Mean   : 6.266  
##  3rd Qu.:5.000     A      :3608        3rd Qu.: 8.000  
##  Max.   :7.000     (Other):5156        Max.   :11.000  
##  NA's   :131       NA's   : 131        NA's   :131     
##  ListingCategory.num BorrowerState                      Occupation   
##  Min.   : 0.00       CA     : 3325   Other                   : 6786  
##  1st Qu.: 1.00       FL     : 1768   Professional            : 3452  
##  Median : 2.00       NY     : 1639   Computer Programmer     : 1261  
##  Mean   : 3.75       TX     : 1562   Administrative Assistant:  959  
##  3rd Qu.: 7.00       IL     : 1389   Executive               :  950  
##  Max.   :20.00       GA     : 1127   (Other)                 :12715  
##                      (Other):15326   NA's                    :   13  
##       EmploymentStatus EmploymentStatusDuration IsBorrowerHomeowner
##  Employed     :16491   Min.   :  0.00           Mode :logical      
##  Full-time    : 6634   1st Qu.: 27.00           FALSE:12814        
##  Self-employed: 1334   Median : 63.00           TRUE :13322        
##  Other        :  798   Mean   : 91.06                              
##  Not employed :  375   3rd Qu.:127.00                              
##  Retired      :  273   Max.   :755.00                              
##  (Other)      :  231   NA's   :9                                   
##  CurrentlyInGroup                    GroupKey     DateCreditPulled    
##  Mode :logical    3D4D3366260257624AB272D:  201   Min.   :2009-07-13  
##  FALSE:24741      783C3371218786870A73D20:  134   1st Qu.:2010-10-13  
##  TRUE :1395       52EA3425051368132B80C96:  109   Median :2011-09-29  
##                   B0473364376920128370B13:   63   Mean   :2011-08-21  
##                   FEF83377364176536637E50:   54   3rd Qu.:2012-06-14  
##                   (Other)                :  817   Max.   :2014-02-13  
##                   NA's                   :24758                       
##  CreditScoreRangeLower CreditScoreRangeUpper FirstRecordedCreditLine
##  Min.   :600.0         Min.   :619.0         Min.   :1953-09-01     
##  1st Qu.:660.0         1st Qu.:679.0         1st Qu.:1990-12-03     
##  Median :700.0         Median :719.0         Median :1996-04-16     
##  Mean   :701.7         Mean   :720.7         Mean   :1995-04-06     
##  3rd Qu.:740.0         3rd Qu.:759.0         3rd Qu.:2000-05-19     
##  Max.   :880.0         Max.   :899.0         Max.   :2012-06-19     
##                                                                     
##  CurrentCreditLines OpenCreditLines  TotalCreditLinespast7years
##  Min.   : 0.000     Min.   : 0.000   Min.   :  2.0             
##  1st Qu.: 6.000     1st Qu.: 5.000   1st Qu.: 16.0             
##  Median : 9.000     Median : 8.000   Median : 25.0             
##  Mean   : 9.576     Mean   : 8.454   Mean   : 26.6             
##  3rd Qu.:13.000     3rd Qu.:11.000   3rd Qu.: 35.0             
##  Max.   :59.000     Max.   :48.000   Max.   :124.0             
##                                                                
##  OpenRevolvingAccounts OpenRevolvingMonthlyPayment InquiriesLast6Months
##  Min.   : 0.000        Min.   :   0.0              Min.   : 0.000      
##  1st Qu.: 3.000        1st Qu.:  97.0              1st Qu.: 0.000      
##  Median : 6.000        Median : 231.0              Median : 1.000      
##  Mean   : 6.442        Mean   : 349.2              Mean   : 1.188      
##  3rd Qu.: 9.000        3rd Qu.: 457.0              3rd Qu.: 2.000      
##  Max.   :47.000        Max.   :5720.0              Max.   :27.000      
##                                                                        
##  TotalInquiries   CurrentDelinquencies AmountDelinquent  
##  Min.   : 0.000   Min.   : 0.0000      Min.   :     0.0  
##  1st Qu.: 2.000   1st Qu.: 0.0000      1st Qu.:     0.0  
##  Median : 4.000   Median : 0.0000      Median :     0.0  
##  Mean   : 4.646   Mean   : 0.3694      Mean   :   992.6  
##  3rd Qu.: 6.000   3rd Qu.: 0.0000      3rd Qu.:     0.0  
##  Max.   :74.000   Max.   :32.0000      Max.   :327677.0  
##                                                          
##  DelinquenciesLast7Years PublicRecordsLast10Years
##  Min.   : 0.000          Min.   : 0.0000         
##  1st Qu.: 0.000          1st Qu.: 0.0000         
##  Median : 0.000          Median : 0.0000         
##  Mean   : 3.401          Mean   : 0.2609         
##  3rd Qu.: 2.000          3rd Qu.: 0.0000         
##  Max.   :99.000          Max.   :12.0000         
##                                                  
##  PublicRecordsLast12Months RevolvingCreditBalance BankcardUtilization
##  Min.   :0.00000           Min.   :     0         Min.   :0.0000     
##  1st Qu.:0.00000           1st Qu.:  2071         1st Qu.:0.2200     
##  Median :0.00000           Median :  6798         Median :0.5400     
##  Mean   :0.01144           Mean   : 15210         Mean   :0.5141     
##  3rd Qu.:0.00000           3rd Qu.: 16600         3rd Qu.:0.8100     
##  Max.   :4.00000           Max.   :879785         Max.   :2.5000     
##                                                                      
##  AvailableBankcardCredit  TotalTrades     TradesNeverDelinquent.per
##  Min.   :     0.0        Min.   :  1.00   Min.   :0.1600           
##  1st Qu.:   850.8        1st Qu.: 14.00   1st Qu.:0.8300           
##  Median :  4198.0        Median : 21.00   Median :0.9500           
##  Mean   : 11174.3        Mean   : 22.87   Mean   :0.8973           
##  3rd Qu.: 13414.0        3rd Qu.: 30.00   3rd Qu.:1.0000           
##  Max.   :412785.0        Max.   :122.00   Max.   :1.0000           
##                                                                    
##  TradesOpenedLast6Months DebtToIncomeRatio         IncomeRange  
##  Min.   : 0.0000         Min.   : 0.0000   $25,000-49,999:8367  
##  1st Qu.: 0.0000         1st Qu.: 0.1300   $50,000-74,999:7411  
##  Median : 0.0000         Median : 0.2000   $75,000-99,999:4041  
##  Mean   : 0.7603         Mean   : 0.2488   $100,000+     :3948  
##  3rd Qu.: 1.0000         3rd Qu.: 0.3000   $1-24,999     :1964  
##  Max.   :20.0000         Max.   :10.0100   Not employed  : 375  
##                          NA's   :2983      (Other)       :  30  
##  IncomeVerifiable StatedMonthlyIncome                    LoanKey     
##  Mode :logical    Min.   :     0      08C43696561586194AC381C:    2  
##  FALSE:2976       1st Qu.:  3167      09303699897852595CD59DD:    2  
##  TRUE :23160      Median :  4583      114D37056655628721BD6C8:    2  
##                   Mean   :  5488      156836977849742636AE34F:    2  
##                   3rd Qu.:  6667      56D73700259224545E36FBC:    2  
##                   Max.   :618548      63113695530739927C7EA06:    2  
##                                       (Other)                :26124  
##  TotalProsperLoans TotalProsperPaymentsBilled OnTimeProsperPayments
##  Min.   :0.000     Min.   :  0.00             Min.   :  0.00       
##  1st Qu.:1.000     1st Qu.:  9.00             1st Qu.:  9.00       
##  Median :1.000     Median : 18.00             Median : 18.00       
##  Mean   :1.401     Mean   : 22.57             Mean   : 21.88       
##  3rd Qu.:2.000     3rd Qu.: 33.00             3rd Qu.: 32.00       
##  Max.   :7.000     Max.   :120.00             Max.   :114.00       
##  NA's   :17826     NA's   :17826              NA's   :17826        
##  ProsperPaymentsLessThanOneMonthLate ProsperPaymentsOneMonthPlusLate
##  Min.   : 0.000                      Min.   : 0.000                 
##  1st Qu.: 0.000                      1st Qu.: 0.000                 
##  Median : 0.000                      Median : 0.000                 
##  Mean   : 0.635                      Mean   : 0.058                 
##  3rd Qu.: 0.000                      3rd Qu.: 0.000                 
##  Max.   :42.000                      Max.   :21.000                 
##  NA's   :17826                       NA's   :17826                  
##  ProsperPrincipalBorrowed ProsperPrincipalOutstanding
##  Min.   :    0            Min.   :    0.0            
##  1st Qu.: 3000            1st Qu.:    0.0            
##  Median : 5000            Median :  824.7            
##  Mean   : 7394            Mean   : 2127.9            
##  3rd Qu.:10000            3rd Qu.: 3179.1            
##  Max.   :60001            Max.   :22586.7            
##  NA's   :17826            NA's   :17826              
##  ScorexChangeAtTimeOfListing LoanCurrentDaysDelinquent
##  Min.   :-194.00             Min.   :   0.0           
##  1st Qu.: -32.00             1st Qu.:   0.0           
##  Median :  -3.00             Median :   0.0           
##  Mean   :  -0.29             Mean   : 115.9           
##  3rd Qu.:  29.00             3rd Qu.:   0.0           
##  Max.   : 286.00             Max.   :1593.0           
##  NA's   :17923                                        
##  LoanFirstDefaultedCycleNumber LoanMonthsSinceOrigination   LoanNumber    
##  Min.   : 1.00                 Min.   : 1.00              Min.   : 38045  
##  1st Qu.: 9.00                 1st Qu.:21.00              1st Qu.: 45089  
##  Median :13.00                 Median :29.00              Median : 54430  
##  Mean   :14.49                 Mean   :30.47              Mean   : 58559  
##  3rd Qu.:19.00                 3rd Qu.:41.00              3rd Qu.: 68482  
##  Max.   :41.00                 Max.   :56.00              Max.   :132453  
##  NA's   :19891                                                            
##  LoanOriginalAmount LoanOriginationDate  LoanOriginationQuarter
##  Min.   : 1000      Min.   :2009-07-20   Q4 2011: 2352         
##  1st Qu.: 3000      1st Qu.:2010-10-29   Q2 2012: 2272         
##  Median : 4500      Median :2011-10-12   Q1 2012: 2252         
##  Mean   : 6365      Mean   :2011-09-03   Q3 2012: 2213         
##  3rd Qu.: 8000      3rd Qu.:2012-06-25   Q3 2011: 2018         
##  Max.   :35000      Max.   :2014-02-21   Q2 2011: 1713         
##                                          (Other):13316         
##                    MemberKey     MonthlyLoanPayment LP_CustomerPayments
##  C70934206057523078260C7:    7   Min.   :   0.0     Min.   :   -2.35   
##  E4AF3422677498955FFA00E:    7   1st Qu.: 121.6     1st Qu.: 2304.53   
##  720D3508651090808DC328F:    6   Median : 175.9     Median : 4561.31   
##  D65B3496915385104F50CD7:    6   Mean   : 232.2     Mean   : 6193.82   
##  E48334334509567416C8C65:    6   3rd Qu.: 314.4     3rd Qu.: 8501.98   
##  43DB3366978035224D7D9E3:    5   Max.   :2251.5     Max.   :37369.16   
##  (Other)                :26099                                         
##  LP_CustomerPrincipalPayments LP_InterestandFees LP_ServiceFees   
##  Min.   :    0                Min.   :   -2.35   Min.   :-589.95  
##  1st Qu.: 1795                1st Qu.:  326.71   1st Qu.: -70.74  
##  Median : 4000                Median :  746.15   Median : -35.07  
##  Mean   : 5128                Mean   : 1065.72   Mean   : -52.18  
##  3rd Qu.: 7000                3rd Qu.: 1487.20   3rd Qu.: -16.07  
##  Max.   :35000                Max.   :10013.57   Max.   :   3.01  
##                                                                   
##  LP_CollectionFees  LP_GrossPrincipalLoss LP_NetPrincipalLoss
##  Min.   :-4865.08   Min.   :  -94.2       Min.   : -504.4    
##  1st Qu.:    0.00   1st Qu.:    0.0       1st Qu.:    0.0    
##  Median :    0.00   Median :    0.0       Median :    0.0    
##  Mean   :  -17.25   Mean   : 1221.7       Mean   : 1194.6    
##  3rd Qu.:    0.00   3rd Qu.:    0.0       3rd Qu.:    0.0    
##  Max.   :    0.00   Max.   :25000.0       Max.   :25000.0    
##                                                              
##  LP_NonPrincipalRecoverypayments PercentFunded   Recommendations   
##  Min.   :   0.00                 Min.   :0.700   Min.   : 0.00000  
##  1st Qu.:   0.00                 1st Qu.:1.000   1st Qu.: 0.00000  
##  Median :   0.00                 Median :1.000   Median : 0.00000  
##  Mean   :  24.83                 Mean   :0.997   Mean   : 0.03646  
##  3rd Qu.:   0.00                 3rd Qu.:1.000   3rd Qu.: 0.00000  
##  Max.   :7780.03                 Max.   :1.000   Max.   :18.00000  
##                                                                    
##  InvestmentFromFriendsCount InvestmentFromFriendsAmount   Investors      
##  Min.   :0.00000            Min.   :    0.00            Min.   :   1.00  
##  1st Qu.:0.00000            1st Qu.:    0.00            1st Qu.:  28.00  
##  Median :0.00000            Median :    0.00            Median :  62.00  
##  Mean   :0.02124            Mean   :   12.94            Mean   :  92.67  
##  3rd Qu.:0.00000            3rd Qu.:    0.00            3rd Qu.: 125.00  
##  Max.   :9.00000            Max.   :11000.00            Max.   :1189.00  
## 

Here, I see that at least one loan prior to 2009 has no credit grade.

summary(filter(data, !is.na(ClosedDate) & is.na(CreditGrade) & ListingCreationDate < "2009-07-01"))
##                    ListingKey  ListingNumber    ListingCreationDate 
##  0385345033494662260733C:  1   Min.   :149172   Min.   :2007-06-08  
##  04D73431953660481B1EC1D:  1   1st Qu.:306608   1st Qu.:2008-04-08  
##  04F334232790941784498F1:  1   Median :339464   Median :2008-05-26  
##  05153419481232978723A5F:  1   Mean   :341138   Mean   :2008-06-24  
##  059934165217732065237C5:  1   3rd Qu.:397924   3rd Qu.:2008-09-13  
##  06FF342963152332574DF05:  1   Max.   :415961   Max.   :2009-05-06  
##  (Other)                :125                                        
##   CreditGrade       Term                        LoanStatus 
##  NC     :  0   Min.   :12.00   Completed             :122  
##  HR     :  0   1st Qu.:36.00   Chargedoff            :  6  
##  E      :  0   Median :36.00   Defaulted             :  3  
##  D      :  0   Mean   :35.82   Cancelled             :  0  
##  C      :  0   3rd Qu.:36.00   Current               :  0  
##  (Other):  0   Max.   :36.00   FinalPaymentInProgress:  0  
##  NA's   :131                   (Other)               :  0  
##    ClosedDate          BorrowerAPR       BorrowerRate    
##  Min.   :2010-01-28   Min.   :0.06207   Min.   :0.05870  
##  1st Qu.:2011-04-21   1st Qu.:0.11271   1st Qu.:0.09025  
##  Median :2012-04-05   Median :0.17018   Median :0.14000  
##  Mean   :2012-02-01   Mean   :0.18688   Mean   :0.16300  
##  3rd Qu.:2012-10-29   3rd Qu.:0.25811   3rd Qu.:0.22700  
##  Max.   :2013-10-12   Max.   :0.39460   Max.   :0.35300  
##                                                          
##   LenderYield      EstimatedEffectiveYield EstimatedLoss EstimatedReturn
##  Min.   :0.04870   Min.   : NA             Min.   : NA   Min.   : NA    
##  1st Qu.:0.08025   1st Qu.: NA             1st Qu.: NA   1st Qu.: NA    
##  Median :0.13000   Median : NA             Median : NA   Median : NA    
##  Mean   :0.15293   Mean   :NaN             Mean   :NaN   Mean   :NaN    
##  3rd Qu.:0.21700   3rd Qu.: NA             3rd Qu.: NA   3rd Qu.: NA    
##  Max.   :0.34000   Max.   : NA             Max.   : NA   Max.   : NA    
##                    NA's   :131             NA's   :131   NA's   :131    
##  ProsperRating.num ProsperRating.alpha  ProsperScore ListingCategory.num
##  Min.   : NA       NC     :  0         Min.   : NA   Min.   :1.000      
##  1st Qu.: NA       HR     :  0         1st Qu.: NA   1st Qu.:1.000      
##  Median : NA       E      :  0         Median : NA   Median :1.000      
##  Mean   :NaN       D      :  0         Mean   :NaN   Mean   :2.893      
##  3rd Qu.: NA       C      :  0         3rd Qu.: NA   3rd Qu.:5.000      
##  Max.   : NA       (Other):  0         Max.   : NA   Max.   :7.000      
##  NA's   :131       NA's   :131         NA's   :131                      
##  BorrowerState                    Occupation      EmploymentStatus
##  CA     :18    Other                   :30   Full-time    :104    
##  TX     :18    Professional            :23   Employed     : 12    
##  NY     : 9    Analyst                 : 9   Part-time    :  7    
##  IL     : 7    Computer Programmer     : 9   Retired      :  4    
##  CT     : 6    Administrative Assistant: 5   Self-employed:  4    
##  MN     : 6    Teacher                 : 5   Not available:  0    
##  (Other):67    (Other)                 :50   (Other)      :  0    
##  EmploymentStatusDuration IsBorrowerHomeowner CurrentlyInGroup
##  Min.   :  0.00           Mode :logical       Mode :logical   
##  1st Qu.: 26.00           FALSE:66            FALSE:107       
##  Median : 50.00           TRUE :65            TRUE :24        
##  Mean   : 74.24                                               
##  3rd Qu.:105.00                                               
##  Max.   :472.00                                               
##                                                               
##                     GroupKey   DateCreditPulled     CreditScoreRangeLower
##  783C3371218786870A73D20:  5   Min.   :2009-07-13   Min.   :600.0        
##  020E3366126106360DB9421:  1   1st Qu.:2009-10-19   1st Qu.:660.0        
##  17693364417023401A53169:  1   Median :2010-02-03   Median :720.0        
##  18DA336463918236939DCE7:  1   Mean   :2010-02-23   Mean   :711.1        
##  3D4D3366260257624AB272D:  1   3rd Qu.:2010-07-02   3rd Qu.:740.0        
##  (Other)                : 15   Max.   :2010-12-19   Max.   :860.0        
##  NA's                   :107                                             
##  CreditScoreRangeUpper FirstRecordedCreditLine CurrentCreditLines
##  Min.   :619.0         Min.   :1959-10-01      Min.   : 1.00     
##  1st Qu.:679.0         1st Qu.:1992-12-11      1st Qu.: 7.00     
##  Median :739.0         Median :1996-08-28      Median : 9.00     
##  Mean   :730.1         Mean   :1995-06-17      Mean   :10.27     
##  3rd Qu.:759.0         3rd Qu.:2000-04-07      3rd Qu.:13.00     
##  Max.   :879.0         Max.   :2007-09-10      Max.   :35.00     
##                                                                  
##  OpenCreditLines  TotalCreditLinespast7years OpenRevolvingAccounts
##  Min.   : 1.000   Min.   : 4.00              Min.   : 0.000       
##  1st Qu.: 5.000   1st Qu.:17.00              1st Qu.: 4.000       
##  Median : 8.000   Median :22.00              Median : 6.000       
##  Mean   : 8.832   Mean   :25.51              Mean   : 6.855       
##  3rd Qu.:12.000   3rd Qu.:33.00              3rd Qu.: 9.000       
##  Max.   :29.000   Max.   :58.00              Max.   :29.000       
##                                                                   
##  OpenRevolvingMonthlyPayment InquiriesLast6Months TotalInquiries  
##  Min.   :   0.0              Min.   :0.000        Min.   : 0.000  
##  1st Qu.:  90.5              1st Qu.:0.000        1st Qu.: 2.000  
##  Median : 239.0              Median :0.000        Median : 4.000  
##  Mean   : 309.1              Mean   :0.855        Mean   : 5.191  
##  3rd Qu.: 420.0              3rd Qu.:1.000        3rd Qu.: 8.000  
##  Max.   :1956.0              Max.   :9.000        Max.   :19.000  
##                                                                   
##  CurrentDelinquencies AmountDelinquent  DelinquenciesLast7Years
##  Min.   :0.0000       Min.   :    0.0   Min.   : 0.000         
##  1st Qu.:0.0000       1st Qu.:    0.0   1st Qu.: 0.000         
##  Median :0.0000       Median :    0.0   Median : 0.000         
##  Mean   :0.2824       Mean   :  433.7   Mean   : 2.718         
##  3rd Qu.:0.0000       3rd Qu.:    0.0   3rd Qu.: 0.000         
##  Max.   :8.0000       Max.   :31919.0   Max.   :43.000         
##                                                                
##  PublicRecordsLast10Years PublicRecordsLast12Months RevolvingCreditBalance
##  Min.   :0.0000           Min.   :0                 Min.   :    0         
##  1st Qu.:0.0000           1st Qu.:0                 1st Qu.: 2308         
##  Median :0.0000           Median :0                 Median : 8074         
##  Mean   :0.1756           Mean   :0                 Mean   :12039         
##  3rd Qu.:0.0000           3rd Qu.:0                 3rd Qu.:16422         
##  Max.   :3.0000           Max.   :0                 Max.   :97290         
##                                                                           
##  BankcardUtilization AvailableBankcardCredit  TotalTrades   
##  Min.   :0.0000      Min.   :     0          Min.   : 3.00  
##  1st Qu.:0.1800      1st Qu.:  1557          1st Qu.:14.50  
##  Median :0.4400      Median :  6999          Median :19.00  
##  Mean   :0.4524      Mean   : 13522          Mean   :22.21  
##  3rd Qu.:0.7200      3rd Qu.: 17470          3rd Qu.:29.00  
##  Max.   :0.9900      Max.   :110117          Max.   :52.00  
##                                                             
##  TradesNeverDelinquent.per TradesOpenedLast6Months DebtToIncomeRatio
##  Min.   :0.3000            Min.   :0.0000          Min.   :0.0200   
##  1st Qu.:0.8400            1st Qu.:0.0000          1st Qu.:0.1100   
##  Median :0.9600            Median :0.0000          Median :0.2000   
##  Mean   :0.8996            Mean   :0.5725          Mean   :0.2500   
##  3rd Qu.:1.0000            3rd Qu.:1.0000          3rd Qu.:0.2725   
##  Max.   :1.0000            Max.   :5.0000          Max.   :5.5900   
##                                                    NA's   :11       
##          IncomeRange IncomeVerifiable StatedMonthlyIncome
##  $50,000-74,999:45   Mode :logical    Min.   :  212.8    
##  $25,000-49,999:40   FALSE:11         1st Qu.: 3333.3    
##  $75,000-99,999:17   TRUE :120        Median : 4616.7    
##  $100,000+     :16                    Mean   : 5111.2    
##  $1-24,999     :13                    3rd Qu.: 6375.0    
##  Not displayed : 0                    Max.   :20833.3    
##  (Other)       : 0                                       
##                     LoanKey    TotalProsperLoans
##  003C35735230494626ADB02:  1   Min.   :1.000    
##  02CA35638190585257E0D22:  1   1st Qu.:1.000    
##  030B35936026115966F4EA0:  1   Median :1.000    
##  032A357638786716375DFFB:  1   Mean   :1.153    
##  040235782802629332A0C8C:  1   3rd Qu.:1.000    
##  05BC35722810324548A02FE:  1   Max.   :3.000    
##  (Other)                :125   NA's   :72       
##  TotalProsperPaymentsBilled OnTimeProsperPayments
##  Min.   : 1.00              Min.   : 0.00        
##  1st Qu.:14.50              1st Qu.:14.50        
##  Median :24.00              Median :22.00        
##  Mean   :22.76              Mean   :22.54        
##  3rd Qu.:34.00              3rd Qu.:33.50        
##  Max.   :42.00              Max.   :41.00        
##  NA's   :72                 NA's   :72           
##  ProsperPaymentsLessThanOneMonthLate ProsperPaymentsOneMonthPlusLate
##  Min.   :0.0000                      Min.   :0                      
##  1st Qu.:0.0000                      1st Qu.:0                      
##  Median :0.0000                      Median :0                      
##  Mean   :0.2203                      Mean   :0                      
##  3rd Qu.:0.0000                      3rd Qu.:0                      
##  Max.   :3.0000                      Max.   :0                      
##  NA's   :72                          NA's   :72                     
##  ProsperPrincipalBorrowed ProsperPrincipalOutstanding
##  Min.   : 1000            Min.   :   0.00            
##  1st Qu.: 1775            1st Qu.:   0.00            
##  Median : 4500            Median :   0.00            
##  Mean   : 5491            Mean   : 428.24            
##  3rd Qu.: 7500            3rd Qu.:   0.25            
##  Max.   :27000            Max.   :5788.52            
##  NA's   :72               NA's   :72                 
##  ScorexChangeAtTimeOfListing LoanCurrentDaysDelinquent
##  Min.   :-50.00              Min.   :   0.00          
##  1st Qu.: -7.00              1st Qu.:   0.00          
##  Median : 39.00              Median :   0.00          
##  Mean   : 43.37              Mean   :  53.65          
##  3rd Qu.: 83.00              3rd Qu.:   0.00          
##  Max.   :215.00              Max.   :1257.00          
##  NA's   :74                                           
##  LoanFirstDefaultedCycleNumber LoanMonthsSinceOrigination   LoanNumber   
##  Min.   :10.00                 Min.   :39.00              Min.   :38046  
##  1st Qu.:18.00                 1st Qu.:44.00              1st Qu.:39344  
##  Median :23.00                 Median :49.00              Median :40869  
##  Mean   :24.22                 Mean   :48.34              Mean   :41386  
##  3rd Qu.:32.00                 3rd Qu.:52.00              3rd Qu.:43474  
##  Max.   :37.00                 Max.   :56.00              Max.   :46378  
##  NA's   :122                                                             
##  LoanOriginalAmount LoanOriginationDate  LoanOriginationQuarter
##  Min.   : 1000      Min.   :2009-07-22   Q4 2009:32            
##  1st Qu.: 2000      1st Qu.:2009-11-08   Q3 2009:26            
##  Median : 3000      Median :2010-02-17   Q2 2010:21            
##  Mean   : 4187      Mean   :2010-03-11   Q4 2010:21            
##  3rd Qu.: 5000      3rd Qu.:2010-07-18   Q1 2010:17            
##  Max.   :15000      Max.   :2010-12-30   Q3 2010:14            
##                                          (Other): 0            
##                    MemberKey   MonthlyLoanPayment LP_CustomerPayments
##  010B33941340101099BFE47:  1   Min.   :  0.00     Min.   :  458.2    
##  016533808792025682035EE:  1   1st Qu.: 63.24     1st Qu.: 2161.4    
##  0CCD3420393708396FB7287:  1   Median :111.95     Median : 3865.5    
##  0F1733815422230679CFC01:  1   Mean   :146.00     Mean   : 4865.0    
##  0F5133834635103374519DF:  1   3rd Qu.:188.66     3rd Qu.: 6402.7    
##  10D73380714543112C251DF:  1   Max.   :578.69     Max.   :18748.2    
##  (Other)                :125                                         
##  LP_CustomerPrincipalPayments LP_InterestandFees LP_ServiceFees   
##  Min.   :  204.8              Min.   :  11.26    Min.   :-242.93  
##  1st Qu.: 1946.1              1st Qu.: 254.88    1st Qu.: -62.53  
##  Median : 3000.0              Median : 546.00    Median : -38.67  
##  Mean   : 4043.8              Mean   : 821.17    Mean   : -50.11  
##  3rd Qu.: 5000.0              3rd Qu.:1143.52    3rd Qu.: -19.86  
##  Max.   :15000.0              Max.   :3748.19    Max.   :  -1.41  
##                                                                   
##  LP_CollectionFees LP_GrossPrincipalLoss LP_NetPrincipalLoss
##  Min.   :0         Min.   :   0.0        Min.   :   0.0     
##  1st Qu.:0         1st Qu.:   0.0        1st Qu.:   0.0     
##  Median :0         Median :   0.0        Median :   0.0     
##  Mean   :0         Mean   : 145.4        Mean   : 145.4     
##  3rd Qu.:0         3rd Qu.:   0.0        3rd Qu.:   0.0     
##  Max.   :0         Max.   :8911.2        Max.   :8911.2     
##                                                             
##  LP_NonPrincipalRecoverypayments PercentFunded Recommendations  
##  Min.   :0                       Min.   :1     Min.   :0.00000  
##  1st Qu.:0                       1st Qu.:1     1st Qu.:0.00000  
##  Median :0                       Median :1     Median :0.00000  
##  Mean   :0                       Mean   :1     Mean   :0.08397  
##  3rd Qu.:0                       3rd Qu.:1     3rd Qu.:0.00000  
##  Max.   :0                       Max.   :1     Max.   :2.00000  
##                                                                 
##  InvestmentFromFriendsCount InvestmentFromFriendsAmount   Investors    
##  Min.   :0.00000            Min.   :   0.00             Min.   : 10.0  
##  1st Qu.:0.00000            1st Qu.:   0.00             1st Qu.: 75.5  
##  Median :0.00000            Median :   0.00             Median :124.0  
##  Mean   :0.03817            Mean   :  57.97             Mean   :155.5  
##  3rd Qu.:0.00000            3rd Qu.:   0.00             3rd Qu.:204.0  
##  Max.   :1.00000            Max.   :5140.00             Max.   :594.0  
## 

I see that 130 loans are missing a credit grade for no apparent reason. I don’t see any pattern here, and assume that it is impossible right now for me to tell why this data is missing. However, this is a relatively small amount of data.

I am otherwise assuming that CreditGrade was effectively replaced by ProsperScore in 2009, and that these can be used more-or-less interchangeably, particularly given that their labels correspond.

Next, I notice that only about half of the closed loans have estimated effective lender yields or several other estimates of yield/loss, although they are not closed. I assume these are pre-July 2009 listings, but I want to take a closer look at them.

summary(filter(data, !is.na(ClosedDate) & is.na(EstimatedEffectiveYield)))
##                    ListingKey    ListingNumber    ListingCreationDate 
##  00033425227988088FA6752:    1   Min.   :     4   Min.   :2005-11-09  
##  000433785890431972B4743:    1   1st Qu.: 92588   1st Qu.:2007-02-02  
##  00083422661625108817246:    1   Median :199844   Median :2007-09-10  
##  000A34209897973969CFA81:    1   Mean   :201960   Mean   :2007-08-26  
##  000D3410451511356B08F17:    1   3rd Qu.:314319   3rd Qu.:2008-04-19  
##  00143395229257559A91663:    1   Max.   :415961   Max.   :2009-05-06  
##  (Other)                :29078                                        
##   CreditGrade        Term                     LoanStatus   
##  C      :5649   Min.   :12   Completed             :18410  
##  D      :5153   1st Qu.:36   Chargedoff            : 6656  
##  B      :4389   Median :36   Defaulted             : 4013  
##  AA     :3509   Mean   :36   Cancelled             :    5  
##  HR     :3508   3rd Qu.:36   Current               :    0  
##  (Other):6745   Max.   :36   FinalPaymentInProgress:    0  
##  NA's   : 131                (Other)               :    0  
##    ClosedDate          BorrowerAPR       BorrowerRate     LenderYield     
##  Min.   :2005-11-25   Min.   :0.00653   Min.   :0.0000   Min.   :-0.0100  
##  1st Qu.:2008-08-25   1st Qu.:0.13705   1st Qu.:0.1269   1st Qu.: 0.1170  
##  Median :2009-08-17   Median :0.18224   Median :0.1700   Median : 0.1600  
##  Mean   :2009-07-30   Mean   :0.19596   Mean   :0.1833   Mean   : 0.1730  
##  3rd Qu.:2010-07-29   3rd Qu.:0.24753   3rd Qu.:0.2364   3rd Qu.: 0.2224  
##  Max.   :2013-10-12   Max.   :0.51229   Max.   :0.4975   Max.   : 0.4925  
##                       NA's   :25                                          
##  EstimatedEffectiveYield EstimatedLoss   EstimatedReturn ProsperRating.num
##  Min.   : NA             Min.   : NA     Min.   : NA     Min.   : NA      
##  1st Qu.: NA             1st Qu.: NA     1st Qu.: NA     1st Qu.: NA      
##  Median : NA             Median : NA     Median : NA     Median : NA      
##  Mean   :NaN             Mean   :NaN     Mean   :NaN     Mean   :NaN      
##  3rd Qu.: NA             3rd Qu.: NA     3rd Qu.: NA     3rd Qu.: NA      
##  Max.   : NA             Max.   : NA     Max.   : NA     Max.   : NA      
##  NA's   :29084           NA's   :29084   NA's   :29084   NA's   :29084    
##  ProsperRating.alpha  ProsperScore   ListingCategory.num BorrowerState  
##  NC     :    0       Min.   : NA     Min.   :0.000       CA     : 3956  
##  HR     :    0       1st Qu.: NA     1st Qu.:0.000       GA     : 1661  
##  E      :    0       Median : NA     Median :0.000       IL     : 1657  
##  D      :    0       Mean   :NaN     Mean   :1.203       FL     : 1314  
##  C      :    0       3rd Qu.: NA     3rd Qu.:1.000       TX     : 1208  
##  (Other):    0       Max.   : NA     Max.   :7.000       (Other):13773  
##  NA's   :29084       NA's   :29084                       NA's   : 5515  
##                Occupation         EmploymentStatus
##  Other              : 7300   Full-time    :18428  
##  Professional       : 3086   Not available: 5347  
##  Computer Programmer: 1242   Self-employed: 1596  
##  Sales - Commission : 1096   Part-time    :  832  
##  Clerical           : 1048   Retired      :  428  
##  (Other)            :13057   (Other)      :  198  
##  NA's               : 2255   NA's         : 2255  
##  EmploymentStatusDuration IsBorrowerHomeowner CurrentlyInGroup
##  Min.   :  0.00           Mode :logical       Mode :logical   
##  1st Qu.: 15.00           FALSE:16454         FALSE:18611     
##  Median : 40.00           TRUE :12630         TRUE :10473     
##  Mean   : 68.49                                               
##  3rd Qu.: 94.00                                               
##  Max.   :623.00                                               
##  NA's   :7606                                                 
##                     GroupKey     DateCreditPulled    
##  783C3371218786870A73D20:  932   Min.   :2005-11-09  
##  6A3B336601725506917317E:  619   1st Qu.:2007-01-30  
##  3D4D3366260257624AB272D:  606   Median :2007-09-04  
##  FEF83377364176536637E50:  529   Mean   :2007-08-24  
##  C9643379247860156A00EC0:  342   3rd Qu.:2008-04-17  
##  (Other)                : 8287   Max.   :2010-12-19  
##  NA's                   :17769                       
##  CreditScoreRangeLower CreditScoreRangeUpper FirstRecordedCreditLine
##  Min.   :  0.0         Min.   : 19.0         Min.   :1947-08-24     
##  1st Qu.:600.0         1st Qu.:619.0         1st Qu.:1990-07-26     
##  Median :640.0         Median :659.0         Median :1995-06-01     
##  Mean   :644.4         Mean   :663.4         Mean   :1994-08-07     
##  3rd Qu.:700.0         3rd Qu.:719.0         3rd Qu.:1999-08-31     
##  Max.   :880.0         Max.   :899.0         Max.   :2008-07-01     
##  NA's   :591           NA's   :591           NA's   :697            
##  CurrentCreditLines OpenCreditLines TotalCreditLinespast7years
##  Min.   : 0.000     Min.   : 0.0    Min.   :  2.00            
##  1st Qu.: 5.000     1st Qu.: 4.0    1st Qu.: 13.00            
##  Median : 9.000     Median : 7.0    Median : 22.00            
##  Mean   : 9.563     Mean   : 8.2    Mean   : 24.06            
##  3rd Qu.:13.000     3rd Qu.:11.0    3rd Qu.: 32.00            
##  Max.   :52.000     Max.   :51.0    Max.   :136.00            
##  NA's   :7604       NA's   :7604    NA's   :697               
##  OpenRevolvingAccounts OpenRevolvingMonthlyPayment InquiriesLast6Months
##  Min.   : 0.000        Min.   :    0.0             Min.   :  0.000     
##  1st Qu.: 2.000        1st Qu.:   35.0             1st Qu.:  0.000     
##  Median : 5.000        Median :  139.0             Median :  2.000     
##  Mean   : 5.755        Mean   :  303.7             Mean   :  2.841     
##  3rd Qu.: 8.000        3rd Qu.:  374.0             3rd Qu.:  4.000     
##  Max.   :51.000        Max.   :14985.0             Max.   :105.000     
##                                                    NA's   :697         
##  TotalInquiries    CurrentDelinquencies AmountDelinquent
##  Min.   :  0.000   Min.   : 0.000       Min.   :     0  
##  1st Qu.:  3.000   1st Qu.: 0.000       1st Qu.:     0  
##  Median :  7.000   Median : 0.000       Median :     0  
##  Mean   :  9.516   Mean   : 1.398       Mean   :  1118  
##  3rd Qu.: 13.000   3rd Qu.: 1.000       3rd Qu.:    30  
##  Max.   :379.000   Max.   :83.000       Max.   :444745  
##  NA's   :1159      NA's   :697          NA's   :7622    
##  DelinquenciesLast7Years PublicRecordsLast10Years
##  Min.   : 0.000          Min.   : 0.0000         
##  1st Qu.: 0.000          1st Qu.: 0.0000         
##  Median : 0.000          Median : 0.0000         
##  Mean   : 5.652          Mean   : 0.3949         
##  3rd Qu.: 6.000          3rd Qu.: 1.0000         
##  Max.   :99.000          Max.   :30.0000         
##  NA's   :990             NA's   :697             
##  PublicRecordsLast12Months RevolvingCreditBalance BankcardUtilization
##  Min.   :0.000             Min.   :      0        Min.   :0.00       
##  1st Qu.:0.000             1st Qu.:   1192        1st Qu.:0.20       
##  Median :0.000             Median :   5206        Median :0.60       
##  Mean   :0.039             Mean   :  16250        Mean   :0.55       
##  3rd Qu.:0.000             3rd Qu.:  15590        3rd Qu.:0.88       
##  Max.   :7.000             Max.   :1435667        Max.   :5.95       
##  NA's   :7604              NA's   :7604           NA's   :7604       
##  AvailableBankcardCredit  TotalTrades     TradesNeverDelinquent.per
##  Min.   :     0          Min.   :  0.00   Min.   :0.000            
##  1st Qu.:   253          1st Qu.: 11.00   1st Qu.:0.690            
##  Median :  2277          Median : 18.00   Median :0.870            
##  Mean   : 10460          Mean   : 20.48   Mean   :0.807            
##  3rd Qu.: 10162          3rd Qu.: 28.00   3rd Qu.:1.000            
##  Max.   :646285          Max.   :126.00   Max.   :1.000            
##  NA's   :7544            NA's   :7544     NA's   :7544             
##  TradesOpenedLast6Months DebtToIncomeRatio         IncomeRange  
##  Min.   : 0.000          Min.   : 0.0000   $25,000-49,999:8017  
##  1st Qu.: 0.000          1st Qu.: 0.1200   Not displayed :7741  
##  Median : 1.000          Median : 0.2000   $50,000-74,999:5423  
##  Mean   : 1.088          Mean   : 0.3239   $1-24,999     :2620  
##  3rd Qu.: 2.000          3rd Qu.: 0.3000   $75,000-99,999:2418  
##  Max.   :17.000          Max.   :10.0100   $100,000+     :2132  
##  NA's   :7544            NA's   :1258      (Other)       : 733  
##  IncomeVerifiable StatedMonthlyIncome                    LoanKey     
##  Mode :logical    Min.   :     0      00013421083473792D70F75:    1  
##  FALSE:1336       1st Qu.:  2500      000534180797040005C07AA:    1  
##  TRUE :27748      Median :  3833      00093413855467649508680:    1  
##                   Mean   :  4665      000B3366346245964D6187E:    1  
##                   3rd Qu.:  5752      000B34179327090460D3429:    1  
##                   Max.   :208333      000E3392089465002A7DBA0:    1  
##                                       (Other)                :29078  
##  TotalProsperLoans TotalProsperPaymentsBilled OnTimeProsperPayments
##  Min.   :1.000     Min.   : 0.00              Min.   : 0.00        
##  1st Qu.:1.000     1st Qu.: 7.00              1st Qu.: 6.00        
##  Median :1.000     Median :10.00              Median :10.00        
##  Mean   :1.079     Mean   :11.09              Mean   :10.87        
##  3rd Qu.:1.000     3rd Qu.:14.00              3rd Qu.:14.00        
##  Max.   :5.000     Max.   :42.00              Max.   :41.00        
##  NA's   :26796     NA's   :26796              NA's   :26796        
##  ProsperPaymentsLessThanOneMonthLate ProsperPaymentsOneMonthPlusLate
##  Min.   :0.000                       Min.   :0.000                  
##  1st Qu.:0.000                       1st Qu.:0.000                  
##  Median :0.000                       Median :0.000                  
##  Mean   :0.205                       Mean   :0.011                  
##  3rd Qu.:0.000                       3rd Qu.:0.000                  
##  Max.   :7.000                       Max.   :5.000                  
##  NA's   :26796                       NA's   :26796                  
##  ProsperPrincipalBorrowed ProsperPrincipalOutstanding
##  Min.   : 1000            Min.   :    0              
##  1st Qu.: 2550            1st Qu.:    0              
##  Median : 4500            Median : 1970              
##  Mean   : 6012            Mean   : 3027              
##  3rd Qu.: 7500            3rd Qu.: 4145              
##  Max.   :40000            Max.   :21862              
##  NA's   :26796            NA's   :26796              
##  ScorexChangeAtTimeOfListing LoanCurrentDaysDelinquent
##  Min.   :-160.000            Min.   :   0.0           
##  1st Qu.:   0.000            1st Qu.:   0.0           
##  Median :   0.000            Median :   0.0           
##  Mean   :   7.363            Mean   : 491.8           
##  3rd Qu.:  40.000            3rd Qu.: 948.2           
##  Max.   : 215.000            Max.   :2704.0           
##  NA's   :26798                                        
##  LoanFirstDefaultedCycleNumber LoanMonthsSinceOrigination   LoanNumber   
##  Min.   : 0.00                 Min.   : 39.00             Min.   :    1  
##  1st Qu.:10.00                 1st Qu.: 70.00             1st Qu.: 7395  
##  Median :16.00                 Median : 78.00             Median :19450  
##  Mean   :17.32                 Mean   : 78.21             Mean   :19418  
##  3rd Qu.:24.00                 3rd Qu.: 85.00             3rd Qu.:30463  
##  Max.   :44.00                 Max.   :100.00             Max.   :46378  
##  NA's   :18376                                                           
##  LoanOriginalAmount LoanOriginationDate  LoanOriginationQuarter
##  Min.   : 1000      Min.   :2005-11-15   Q2 2008: 4344         
##  1st Qu.: 2500      1st Qu.:2007-02-13   Q3 2008: 3602         
##  Median : 4500      Median :2007-09-21   Q2 2007: 3118         
##  Mean   : 6159      Mean   :2007-09-09   Q1 2007: 3079         
##  3rd Qu.: 7904      3rd Qu.:2008-05-02   Q1 2008: 3074         
##  Max.   :25000      Max.   :2010-12-30   (Other):11845         
##                                          NA's   :   22         
##                    MemberKey     MonthlyLoanPayment LP_CustomerPayments
##  3EF133647645155044BFFD9:    6   Min.   :   0.00    Min.   :    0      
##  7E1733653050264822FAA3D:    6   1st Qu.:  84.84    1st Qu.: 1647      
##  16083364744933457E57FB9:    4   Median : 153.80    Median : 3778      
##  242A33660960718280E1642:    4   Mean   : 215.72    Mean   : 5683      
##  5B8333756488098823F5EFE:    4   3rd Qu.: 275.77    3rd Qu.: 7403      
##  63CA34120866140639431C9:    4   Max.   :1130.90    Max.   :40702      
##  (Other)                :29056                                         
##  LP_CustomerPrincipalPayments LP_InterestandFees LP_ServiceFees   
##  Min.   :    0                Min.   :    0.0    Min.   :-664.87  
##  1st Qu.: 1069                1st Qu.:  335.4    1st Qu.: -76.15  
##  Median : 3000                Median :  779.3    Median : -33.50  
##  Mean   : 4502                Mean   : 1180.7    Mean   : -54.97  
##  3rd Qu.: 6000                3rd Qu.: 1532.2    3rd Qu.: -13.14  
##  Max.   :25693                Max.   :15617.0    Max.   :  32.06  
##                                                                   
##  LP_CollectionFees  LP_GrossPrincipalLoss LP_NetPrincipalLoss
##  Min.   :-9274.75   Min.   :    0         Min.   : -954.5    
##  1st Qu.:    0.00   1st Qu.:    0         1st Qu.:    0.0    
##  Median :    0.00   Median :    0         Median :    0.0    
##  Mean   :  -31.86   Mean   : 1647         Mean   : 1596.6    
##  3rd Qu.:    0.00   3rd Qu.: 1863         3rd Qu.: 1748.7    
##  Max.   :    0.00   Max.   :25000         Max.   :25000.0    
##                                                              
##  LP_NonPrincipalRecoverypayments PercentFunded   Recommendations  
##  Min.   :    0.00                Min.   :1.000   Min.   : 0.0000  
##  1st Qu.:    0.00                1st Qu.:1.000   1st Qu.: 0.0000  
##  Median :    0.00                Median :1.000   Median : 0.0000  
##  Mean   :   76.19                Mean   :1.000   Mean   : 0.1369  
##  3rd Qu.:    0.00                3rd Qu.:1.000   3rd Qu.: 0.0000  
##  Max.   :21117.90                Max.   :1.011   Max.   :39.0000  
##                                                                   
##  InvestmentFromFriendsCount InvestmentFromFriendsAmount   Investors    
##  Min.   : 0.00000           Min.   :    0.00            Min.   :  1.0  
##  1st Qu.: 0.00000           1st Qu.:    0.00            1st Qu.: 34.0  
##  Median : 0.00000           Median :    0.00            Median : 78.0  
##  Mean   : 0.06842           Mean   :   52.25            Mean   :116.1  
##  3rd Qu.: 0.00000           3rd Qu.:    0.00            3rd Qu.:158.0  
##  Max.   :33.00000           Max.   :25000.00            Max.   :913.0  
## 

That is indeed the case, and as the same percentage of the other similar measures is missing, I will assume this is also the case for those measures.

I see that some borrower demographic, employment, and previous credit information is missing, but I assume that this is simply missing data, with no larger story behind it, particularly as this is a relatively small percentage of loans. I also see that more of this information is missing for loans that have been closed, which suggests to me that this data was either lost, or not gathered as thoroughly in the past.

The majority of the borrowers in both categories have no prior Prosper history, and it would be interesting to see if, for example, not having any Prosper history leads to more delinquencies than having positive Prosper history.

Most loans were not charged off, but about 30% of closed loans at least at some point became delinquent (LoanFirstDefaultedCycleNumber). A very small number of open loans are delinquent.

Lender Profit

At this point, I want to take a look, through plotting correlations, at how predictive the above background, financial, or demographic measures are of measures most closely related to lender profit.

LoanStatus vs. CreditGrade/ProsperRating

In the case of LoanStatus, as this is not a quantitative or clearly ordered factor, it may make sense to at least visually organize some of the levels. I therefore ‘group’ all Past Due levels together, and order the levels loosely in terms of ‘goodness’ - assuming that being on time, or having paid off the loan, is ‘good,’ and that having defaulted, or having the loan charged off, is ‘bad.’ I group CreditGrade and ProsperRating into one measure, and then plot LoanStatus by this new rating, to see if there are any obvious patterns on how likely one is to have a particular loan status, given a particular starting rating.

What I see here is that the higher the rating, the greater the likelihood that the loan is either completed or current, and the less the likelihood that it is past due, charged off, or defaulted. Overall, it seems that a customer with a higher rating at the time the loan is posted will indeed be more likely to pay off a loan in the future.

Exploring Profit Measures

First, I want to get a sense of when these measures might be getting assigned, in cases where documentation does not make this clear. To make this more clear, I will look at loans which have not been closed, and see if they systematically include this information (compared to loans which are closed). If they do, it’s relatively safe to say that these measures are predictions, rather than reports of actual yield.

summary(filter(data, is.na(ClosedDate)))
##                    ListingKey    ListingNumber     ListingCreationDate 
##  17A93590655669644DB4C06:    6   Min.   : 464139   Min.   :2010-06-24  
##  349D3587495831350F0F648:    4   1st Qu.: 682358   1st Qu.:2012-12-04  
##  47C1359638497431975670B:    4   Median : 875238   Median :2013-08-20  
##  8474358854651984137201C:    4   Mean   : 870182   Mean   :2013-05-16  
##  DE8535960513435199406CE:    4   3rd Qu.:1051465   3rd Qu.:2013-12-05  
##  04C13599434217079754AEE:    3   Max.   :1255725   Max.   :2014-03-10  
##  (Other)                :58823                                         
##   CreditGrade         Term                        LoanStatus   
##  NC     :    0   Min.   :12.00   Current               :56576  
##  HR     :    0   1st Qu.:36.00   Past Due (1-15 days)  :  806  
##  E      :    0   Median :36.00   Past Due (31-60 days) :  363  
##  D      :    0   Mean   :44.47   Past Due (61-90 days) :  313  
##  C      :    0   3rd Qu.:60.00   Past Due (91-120 days):  304  
##  (Other):    0   Max.   :60.00   Past Due (16-30 days) :  265  
##  NA's   :58848                   (Other)               :  221  
##    ClosedDate     BorrowerAPR       BorrowerRate     LenderYield    
##  Min.   :NA      Min.   :0.06106   Min.   :0.0577   Min.   :0.0477  
##  1st Qu.:NA      1st Qu.:0.16056   1st Qu.:0.1334   1st Qu.:0.1234  
##  Median :NA      Median :0.20679   Median :0.1769   Median :0.1669  
##  Mean   :NA      Mean   :0.21568   Mean   :0.1856   Mean   :0.1756  
##  3rd Qu.:NA      3rd Qu.:0.26877   3rd Qu.:0.2346   3rd Qu.:0.2246  
##  Max.   :NA      Max.   :0.38486   Max.   :0.3435   Max.   :0.3335  
##  NA's   :58848                                                      
##  EstimatedEffectiveYield EstimatedLoss     EstimatedReturn  
##  Min.   :0.0474          Min.   :0.00490   Min.   :0.03700  
##  1st Qu.:0.1181          1st Qu.:0.04200   1st Qu.:0.07400  
##  Median :0.1575          Median :0.06490   Median :0.08728  
##  Mean   :0.1653          Mean   :0.07435   Mean   :0.09100  
##  3rd Qu.:0.2086          3rd Qu.:0.10250   3rd Qu.:0.10790  
##  Max.   :0.3057          Max.   :0.20300   Max.   :0.17610  
##                                                             
##  ProsperRating.num ProsperRating.alpha  ProsperScore   ListingCategory.num
##  Min.   :1.000     C      :14528       Min.   : 1.00   Min.   : 0.000     
##  1st Qu.:3.000     B      :12208       1st Qu.: 4.00   1st Qu.: 1.000     
##  Median :4.000     A      :10943       Median : 6.00   Median : 1.000     
##  Mean   :4.253     D      : 8405       Mean   : 5.81   Mean   : 3.118     
##  3rd Qu.:5.000     E      : 5965       3rd Qu.: 8.00   3rd Qu.: 2.000     
##  Max.   :7.000     AA     : 3589       Max.   :11.00   Max.   :20.000     
##                    (Other): 3210                                          
##  BorrowerState                 Occupation         EmploymentStatus
##  CA     : 7454   Other              :14561   Employed     :50831  
##  NY     : 4214   Professional       : 7113   Self-employed: 3208  
##  TX     : 4090   Executive          : 2522   Other        : 3008  
##  FL     : 3642   Teacher            : 2111   Full-time    : 1397  
##  IL     : 2882   Computer Programmer: 1984   Not employed :  274  
##  OH     : 2389   (Other)            :29237   Retired      :   98  
##  (Other):34177   NA's               : 1320   (Other)      :   32  
##  EmploymentStatusDuration IsBorrowerHomeowner CurrentlyInGroup
##  Min.   :  0.0            Mode :logical       Mode :logical   
##  1st Qu.: 32.0            FALSE:27257         FALSE:57973     
##  Median : 79.0            TRUE :31591         TRUE :875       
##  Mean   :108.3                                                
##  3rd Qu.:156.0                                                
##  Max.   :733.0                                                
##  NA's   :10                                                   
##                     GroupKey     DateCreditPulled    
##  3D4D3366260257624AB272D:  110   Min.   :2008-01-23  
##  783C3371218786870A73D20:   79   1st Qu.:2012-12-03  
##  52EA3425051368132B80C96:   41   Median :2013-08-22  
##  FEF83377364176536637E50:   29   Mean   :2013-05-17  
##  6A3B336601725506917317E:   26   3rd Qu.:2013-12-05  
##  (Other)                :  387   Max.   :2014-03-10  
##  NA's                   :58176                       
##  CreditScoreRangeLower CreditScoreRangeUpper FirstRecordedCreditLine
##  Min.   :600.0         Min.   :619.0         Min.   :1951-01-01     
##  1st Qu.:660.0         1st Qu.:679.0         1st Qu.:1990-03-01     
##  Median :700.0         Median :719.0         Median :1995-11-22     
##  Mean   :698.4         Mean   :717.4         Mean   :1994-11-04     
##  3rd Qu.:720.0         3rd Qu.:739.0         3rd Qu.:2000-05-11     
##  Max.   :880.0         Max.   :899.0         Max.   :2012-12-22     
##                                                                     
##  CurrentCreditLines OpenCreditLines TotalCreditLinespast7years
##  Min.   : 0.00      Min.   : 0      Min.   :  2.00            
##  1st Qu.: 7.00      1st Qu.: 7      1st Qu.: 19.00            
##  Median :10.00      Median : 9      Median : 27.00            
##  Mean   :10.92      Mean   :10      Mean   : 28.12            
##  3rd Qu.:14.00      3rd Qu.:13      3rd Qu.: 36.00            
##  Max.   :54.00      Max.   :54      Max.   :125.00            
##                                                               
##  OpenRevolvingAccounts OpenRevolvingMonthlyPayment InquiriesLast6Months
##  Min.   : 0.000        Min.   :    0.0             Min.   : 0.0000     
##  1st Qu.: 5.000        1st Qu.:  188.0             1st Qu.: 0.0000     
##  Median : 7.000        Median :  344.0             Median : 0.0000     
##  Mean   : 7.805        Mean   :  466.6             Mean   : 0.8649     
##  3rd Qu.:10.000        3rd Qu.:  606.0             3rd Qu.: 1.0000     
##  Max.   :50.000        Max.   :13765.0             Max.   :15.0000     
##                                                                        
##  TotalInquiries   CurrentDelinquencies AmountDelinquent
##  Min.   : 0.000   Min.   : 0.0000      Min.   :     0  
##  1st Qu.: 2.000   1st Qu.: 0.0000      1st Qu.:     0  
##  Median : 3.000   Median : 0.0000      Median :     0  
##  Mean   : 4.134   Mean   : 0.3015      Mean   :   931  
##  3rd Qu.: 6.000   3rd Qu.: 0.0000      3rd Qu.:     0  
##  Max.   :78.000   Max.   :51.0000      Max.   :463881  
##                                                        
##  DelinquenciesLast7Years PublicRecordsLast10Years
##  Min.   : 0.000          Min.   : 0.0000         
##  1st Qu.: 0.000          1st Qu.: 0.0000         
##  Median : 0.000          Median : 0.0000         
##  Mean   : 3.772          Mean   : 0.2956         
##  3rd Qu.: 2.000          3rd Qu.: 0.0000         
##  Max.   :99.000          Max.   :38.0000         
##                                                  
##  PublicRecordsLast12Months RevolvingCreditBalance BankcardUtilization
##  Min.   : 0.00000          Min.   :     0         Min.   :0.0000     
##  1st Qu.: 0.00000          1st Qu.:  4736         1st Qu.:0.3700     
##  Median : 0.00000          Median : 10388         Median :0.6200     
##  Mean   : 0.00814          Mean   : 19140         Mean   :0.5862     
##  3rd Qu.: 0.00000          3rd Qu.: 21972         3rd Qu.:0.8300     
##  Max.   :20.00000          Max.   :999165         Max.   :1.8200     
##                                                                      
##  AvailableBankcardCredit  TotalTrades    TradesNeverDelinquent.per
##  Min.   :     0          Min.   :  1.0   Min.   :0.0800           
##  1st Qu.:  1296          1st Qu.: 16.0   1st Qu.:0.8500           
##  Median :  4727          Median : 23.0   Median :0.9600           
##  Mean   : 11506          Mean   : 24.4   Mean   :0.9097           
##  3rd Qu.: 14111          3rd Qu.: 31.0   3rd Qu.:1.0000           
##  Max.   :498374          Max.   :108.0   Max.   :1.0000           
##                                                                   
##  TradesOpenedLast6Months DebtToIncomeRatio         IncomeRange   
##  Min.   : 0.0000         Min.   : 0.000    $50,000-74,999:18261  
##  1st Qu.: 0.0000         1st Qu.: 0.160    $25,000-49,999:15848  
##  Median : 0.0000         Median : 0.230    $100,000+     :11273  
##  Mean   : 0.7159         Mean   : 0.263    $75,000-99,999:10474  
##  3rd Qu.: 1.0000         3rd Qu.: 0.320    $1-24,999     : 2703  
##  Max.   :16.0000         Max.   :10.010    Not employed  :  274  
##                          NA's   :4324      (Other)       :   15  
##  IncomeVerifiable StatedMonthlyIncome                    LoanKey     
##  Mode :logical    Min.   :      0     CB1B37030986463208432A1:    6  
##  FALSE:4368       1st Qu.:   3617     2DEE3698211017519D7333F:    4  
##  TRUE :54480      Median :   5167     9F4B37043517554537C364C:    4  
##                   Mean   :   6126     D895370150591392337ED6D:    4  
##                   3rd Qu.:   7417     E6FB37073953690388BC56D:    4  
##                   Max.   :1750003     0D8F37036734373301ED419:    3  
##                                       (Other)                :58823  
##  TotalProsperLoans TotalProsperPaymentsBilled OnTimeProsperPayments
##  Min.   :1.0       Min.   :  0.00             Min.   :  0.00       
##  1st Qu.:1.0       1st Qu.: 10.00             1st Qu.: 10.00       
##  Median :1.0       Median : 17.00             Median : 17.00       
##  Mean   :1.5       Mean   : 25.54             Mean   : 24.81       
##  3rd Qu.:2.0       3rd Qu.: 35.00             3rd Qu.: 35.00       
##  Max.   :8.0       Max.   :141.00             Max.   :141.00       
##  NA's   :47302     NA's   :47302              NA's   :47302        
##  ProsperPaymentsLessThanOneMonthLate ProsperPaymentsOneMonthPlusLate
##  Min.   : 0.00                       Min.   : 0.00                  
##  1st Qu.: 0.00                       1st Qu.: 0.00                  
##  Median : 0.00                       Median : 0.00                  
##  Mean   : 0.68                       Mean   : 0.05                  
##  3rd Qu.: 0.00                       3rd Qu.: 0.00                  
##  Max.   :42.00                       Max.   :21.00                  
##  NA's   :47302                       NA's   :47302                  
##  ProsperPrincipalBorrowed ProsperPrincipalOutstanding
##  Min.   : 1000            Min.   :    0.00           
##  1st Qu.: 4000            1st Qu.:    0.01           
##  Median : 7400            Median : 2213.24           
##  Mean   : 9721            Mean   : 3475.83           
##  3rd Qu.:13500            3rd Qu.: 5204.00           
##  Max.   :72499            Max.   :23450.95           
##  NA's   :47302            NA's   :47302              
##  ScorexChangeAtTimeOfListing LoanCurrentDaysDelinquent
##  Min.   :-209.0              Min.   :  0.000          
##  1st Qu.: -38.0              1st Qu.:  0.000          
##  Median :  -9.0              Median :  0.000          
##  Mean   :  -8.6              Mean   :  1.468          
##  3rd Qu.:  18.0              3rd Qu.:  0.000          
##  Max.   : 220.0              Max.   :129.000          
##  NA's   :50362                                        
##  LoanFirstDefaultedCycleNumber LoanMonthsSinceOrigination   LoanNumber    
##  Min.   : 1.00                 Min.   : 0.00              Min.   : 43212  
##  1st Qu.: 1.00                 1st Qu.: 3.00              1st Qu.: 79386  
##  Median : 7.50                 Median : 7.00              Median :100276  
##  Mean   :11.88                 Mean   : 9.68              Mean   : 98941  
##  3rd Qu.:17.25                 3rd Qu.:15.00              3rd Qu.:121614  
##  Max.   :38.00                 Max.   :45.00              Max.   :136486  
##  NA's   :58840                                                            
##  LoanOriginalAmount LoanOriginationDate  LoanOriginationQuarter
##  Min.   : 1500      Min.   :2010-06-30   Q4 2013:14058         
##  1st Qu.: 4000      1st Qu.:2012-12-18   Q1 2014:12103         
##  Median :10000      Median :2013-08-29   Q3 2013: 8592         
##  Mean   :10280      Mean   :2013-05-27   Q2 2013: 6268         
##  3rd Qu.:15000      3rd Qu.:2013-12-16   Q3 2012: 3419         
##  Max.   :35000      Max.   :2014-03-12   Q4 2012: 3022         
##                                          (Other):11386         
##                    MemberKey     MonthlyLoanPayment LP_CustomerPayments
##  F80D3694083622957BA09F2:    6   Min.   :   0.0     Min.   :    0      
##  0F0C35762146892131F3BB4:    4   1st Qu.: 166.6     1st Qu.:  555      
##  22B53699795042922A27DCC:    4   Median : 286.9     Median : 1516      
##  61E93477058090904D07D4F:    4   Mean   : 318.1     Mean   : 2550      
##  946A35068649687154063A9:    4   3rd Qu.: 415.1     3rd Qu.: 3367      
##  EA463494084516244B9C542:    4   Max.   :2163.6     Max.   :31613      
##  (Other)                :58822                                         
##  LP_CustomerPrincipalPayments LP_InterestandFees LP_ServiceFees   
##  Min.   :    0.0              Min.   :    0.0    Min.   :-564.85  
##  1st Qu.:  286.8              1st Qu.:  221.7    1st Qu.: -73.29  
##  Median :  795.5              Median :  640.4    Median : -34.75  
##  Mean   : 1519.1              Mean   : 1031.3    Mean   : -55.72  
##  3rd Qu.: 1872.4              3rd Qu.: 1410.9    3rd Qu.: -13.11  
##  Max.   :30831.1              Max.   :10572.8    Max.   :   0.77  
##                                                                   
##  LP_CollectionFees   LP_GrossPrincipalLoss LP_NetPrincipalLoss
##  Min.   :-1242.460   Min.   :0             Min.   :0          
##  1st Qu.:    0.000   1st Qu.:0             1st Qu.:0          
##  Median :    0.000   Median :0             Median :0          
##  Mean   :   -4.171   Mean   :0             Mean   :0          
##  3rd Qu.:    0.000   3rd Qu.:0             3rd Qu.:0          
##  Max.   :    0.000   Max.   :0             Max.   :0          
##                                                               
##  LP_NonPrincipalRecoverypayments PercentFunded    Recommendations    
##  Min.   :0                       Min.   :0.7000   Min.   : 0.000000  
##  1st Qu.:0                       1st Qu.:1.0000   1st Qu.: 0.000000  
##  Median :0                       Median :1.0000   Median : 0.000000  
##  Mean   :0                       Mean   :0.9986   Mean   : 0.009312  
##  3rd Qu.:0                       3rd Qu.:1.0000   3rd Qu.: 0.000000  
##  Max.   :0                       Max.   :1.0125   Max.   :19.000000  
##                                                                      
##  InvestmentFromFriendsCount InvestmentFromFriendsAmount   Investors     
##  Min.   :0.00000            Min.   :   0.0000           Min.   :  1.00  
##  1st Qu.:0.00000            1st Qu.:   0.0000           1st Qu.:  1.00  
##  Median :0.00000            Median :   0.0000           Median :  8.00  
##  Mean   :0.00226            Mean   :   0.6037           Mean   : 57.62  
##  3rd Qu.:0.00000            3rd Qu.:   0.0000           3rd Qu.: 79.00  
##  Max.   :6.00000            Max.   :3000.0000           Max.   :779.00  
## 

All of these open loans have non-zero values assigned to the following measures, suggesting that these measures are predictive rather than descriptive of actual outcomes: LenderYield, EstimatedEffectiveYield, EstimatedLoss, EstimatedReturn. On the other hand, many open loans have zero values assigned for these profit measures: LP_CustomerPayments, LP_CustomerPrincipalPayments, LP_InterestandFees, LP_ServiceFees, LP_CollectionFees, LP_GrossPrincipalLoss, LP_NetPrincipalLoss, and LP_NonPrincipalRecoverypayments (in fact, the last 3 have only zero values assigned). These I will take a closer look at.

Relationship Between Measures

Here, I quickly want to look at how well-correlated the numerical factors associated with profit are, to see if I need to look at all of them when seeing how predictive demographic factors are of profit. I expect, in any case, that the most productive factors to look at are EstimatedEffectiveYield (an overall view of how much lenders profit), EstimatedLoss (as this separately looks at principal loss), and LoanCurrentDaysDelinquent (as delinquency, even if the loan is ultimately paid, is likely of interest to lenders).

profit <- c("LenderYield", "EstimatedEffectiveYield", "EstimatedLoss", "LoanCurrentDaysDelinquent", "LP_GrossPrincipalLoss", "LP_NetPrincipalLoss","ProsperRating.num")

library(ggcorrplot)

corr <- cor(data[profit], use = "complete.obs")
head(corr[, 1:6])
##                           LenderYield EstimatedEffectiveYield
## LenderYield                 1.0000000               0.8953425
## EstimatedEffectiveYield     0.8953425               1.0000000
## EstimatedLoss               0.9453084               0.7981346
## LoanCurrentDaysDelinquent   0.2157334               0.1342877
## LP_GrossPrincipalLoss       0.1362828               0.1394210
## LP_NetPrincipalLoss         0.1347704               0.1387118
##                           EstimatedLoss LoanCurrentDaysDelinquent
## LenderYield                  0.94530836                 0.2157334
## EstimatedEffectiveYield      0.79813456                 0.1342877
## EstimatedLoss                1.00000000                 0.1953217
## LoanCurrentDaysDelinquent    0.19532174                 1.0000000
## LP_GrossPrincipalLoss        0.09333752                 0.6034317
## LP_NetPrincipalLoss          0.09205293                 0.6049597
##                           LP_GrossPrincipalLoss LP_NetPrincipalLoss
## LenderYield                          0.13628281          0.13477041
## EstimatedEffectiveYield              0.13942102          0.13871181
## EstimatedLoss                        0.09333752          0.09205293
## LoanCurrentDaysDelinquent            0.60343173          0.60495970
## LP_GrossPrincipalLoss                1.00000000          0.99330220
## LP_NetPrincipalLoss                  0.99330220          1.00000000
ggcorrplot(corr, hc.order = TRUE, type = "lower",
     outline.col = "white", lab = TRUE)

It turns out that most of the potential profit measures are not that well-correlated. Several, however, are well-correlated with each other (positively or negatively), and I expect these to likely be better representations of profit (as, if a potential measure of profit correlates with no other potential measures of profit, then it it unlikely to represent profit well, unless it is the single accurate measure of profit in the bunch, which is unlikely).

The profit measures showing the highest correlations with other measures are the following: EstimatedReturn, EstimatedEffectiveYield, EstimatedLoss, LenderYield, and ProsperRating. It’s likely that the other measures are informative for other, more specific, questions, but at a first glance, it makes sense to look at the most obvious measures of profit. It’s also possible that LoanStatus, previously looked at, is also informative, but it has a more indirect relationship to profit (especially given that, as a category, it is inherently in flux).

It is possible that measures of delinquency - LoanCurrentDaysDelinquent,OnTimeProsperPayments,CurrentDelinquencies, and AmountDelinquent would affect lenders’ willingness to engage with clients regardless of ultimate gain, or loss, particularly for lenders who rely on a regular ‘income.’ In this case, it would also be worth looking at how much the various demographic predictors correlate with these measures, particularly as they do not seem to be reflected by the Prosper rating (or other measures).

It is not clear to me exactly what LP_GrossPrincipalLoss and LP_NetPrincipalLoss mean, so I will not look at them for now. In addition, both seem reasonably well-correlated with one of the delinquency measures.

EstimatedReturn vs. EstimatedEffectiveYield

ggplot(data, aes(x = EstimatedReturn, y = EstimatedEffectiveYield)) + 
  geom_point() + 
  stat_summary(fun.data = mean_cl_normal) + 
  stat_smooth(n=2000) +
  labs(title = "EstimatedReturn by EstimatedEffectiveYield") +
  ylim(-0.5,0.5)
## Warning: Removed 29084 rows containing non-finite values (stat_summary).
## `geom_smooth()` using method = 'gam'
## Warning: Removed 29084 rows containing non-finite values (stat_smooth).
## Warning: Removed 29084 rows containing missing values (geom_point).
## Warning: Removed 437 rows containing missing values (geom_pointrange).

EstimatedReturn vs. EstimatedLoss

ggplot(filter(data, !is.na(EstimatedReturn) & !is.na(EstimatedLoss)), aes(x = EstimatedReturn, y = EstimatedLoss)) + 
  geom_point() + 
  stat_summary(fun.data = mean_cl_normal) + 
  geom_smooth(formula = y~x) +
  labs(title = "EstimatedReturn by EstimatedLoss") + 
  ylim(0,0.5)
## `geom_smooth()` using method = 'gam'
## Warning: Removed 568 rows containing missing values (geom_pointrange).

EstimatedReturn vs. LenderYield

ggplot(data, aes(x = EstimatedReturn, y = LenderYield)) + 
  geom_point() + 
  stat_summary(fun.data = mean_cl_normal) + 
  geom_smooth(formula = y~x) +
  labs(title = "EstimatedReturn by LenderYield") + 
  ylim(-0.1,0.5)
## Warning: Removed 29084 rows containing non-finite values (stat_summary).
## `geom_smooth()` using method = 'gam'
## Warning: Removed 29084 rows containing non-finite values (stat_smooth).
## Warning: Removed 29084 rows containing missing values (geom_point).
## Warning: Removed 497 rows containing missing values (geom_pointrange).

EstimatedEffectiveYield vs. EstimatedLoss

ggplot(data, aes(x = EstimatedEffectiveYield, y = EstimatedLoss)) + 
  geom_point() + 
  stat_summary(fun.data = mean_cl_normal) + 
  geom_smooth(formula = y~x) +
  labs(title = "EstimatedEffectiveYield by EstimatedLoss") + 
  ylim(0,0.5)
## Warning: Removed 29084 rows containing non-finite values (stat_summary).
## `geom_smooth()` using method = 'gam'
## Warning: Removed 29084 rows containing non-finite values (stat_smooth).
## Warning: Removed 29084 rows containing missing values (geom_point).
## Warning: Removed 638 rows containing missing values (geom_pointrange).

EstimatedEffectiveYield vs. LenderYield

ggplot(data, aes(x = EstimatedEffectiveYield, y = LenderYield)) + 
  geom_point() + 
  stat_summary(fun.data = mean_cl_normal) + 
  geom_smooth(formula = y~x) +
  labs(title = "EstimatedEffectiveYield by LenderYield") + 
  ylim(-0.1,0.5)
## Warning: Removed 29084 rows containing non-finite values (stat_summary).
## `geom_smooth()` using method = 'gam'
## Warning: Removed 29084 rows containing non-finite values (stat_smooth).
## Warning: Removed 29084 rows containing missing values (geom_point).
## Warning: Removed 552 rows containing missing values (geom_pointrange).

EstimatedLoss vs. LenderYield

ggplot(data, aes(x = EstimatedLoss, y = LenderYield)) + 
  geom_point() + 
  stat_summary(fun.data = mean_cl_normal) + 
  geom_smooth(formula = y~x) +
  labs(title = "EstimatedLoss by LenderYield") + 
  ylim(-0.1,0.5)
## Warning: Removed 29084 rows containing non-finite values (stat_summary).
## `geom_smooth()` using method = 'gam'
## Warning: Removed 29084 rows containing non-finite values (stat_smooth).
## Warning: Removed 29084 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_pointrange).

This is one of the more interesting graphs - it shows clearly that while lender yield increases with estimated loss, at higher levels of loss, the yield ceases to increase, and levels off, or even drops slightly towards higher levels of loss.

Occupation vs. Profit

If credit grades are only assigned once the fate of the loan is known, it may be more useful to look at how predictive pre-existing factors such as Occupation are of profit measures.

subset <- c("EstimatedReturn", "EstimatedEffectiveYield", "EstimatedLoss", "LenderYield", "ProsperRating.num","Occupation")

plot_data <- data[subset] %>% 
  group_by(Occupation) %>% 
  summarize_all(funs(mean(., na.rm = TRUE))) %>%
  mutate(Occupation = reorder(Occupation, ProsperRating.num, mean)) %>%
  gather(Measure, Value, -Occupation)

ggplot(plot_data, aes(x = Occupation, y=Value)) +
  geom_bar(stat = "identity") +
  labs(title = "Profit by Occupation") +
  facet_grid(~Measure, scales="free") +
  coord_flip() + 
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

There are too many occupations to make easy generalizations. Occupations would likely need to be grouped into a smaller number of categories. However, one can observe general trends - those with higher-paying occupations (or occupational prospects) seem to be more profitable customers. On the other hand, students in general are among the bank’s least profitable customers. This suggests that income, which is grouped in a more sensible manner, may be useful to look at.

Income Range vs. Profit

subset <- c("EstimatedReturn", "EstimatedEffectiveYield", "EstimatedLoss", "LenderYield", "ProsperRating.num","IncomeRange")

plot_data <- data[subset] %>% 
  group_by(IncomeRange) %>% 
  summarize_all(funs(mean(., na.rm = TRUE))) %>%
  gather(Measure, Value, -IncomeRange)

ggplot(plot_data, aes(x = IncomeRange, y=Value)) +
  geom_bar(stat = "identity") +
  labs(title = "Profit by IncomeRange") +
  facet_grid(~Measure, scales="free") +
  coord_flip() + 
  theme(axis.text.x = element_text(angle = 90, hjust = 1))
## Warning: Removed 4 rows containing missing values (position_stack).

Here, it can be seen that as income rises, the ProsperRating increases, and other measures of profit decrease. As we have seen, ProsperRating correlates with credit score and likelihood of not defaulting. This suggests that high-income lenders are lower-risk, but lower-income lenders, while being higher-risk, can also yield more profit.

Other vs. Profit

Here, I will look at the relationship between other predictors and profit measures.

EmploymentStatus

subset <- c("EstimatedReturn", "EstimatedEffectiveYield", "EstimatedLoss", "LenderYield", "ProsperRating.num","EmploymentStatus")

plot_data <- data[subset] %>% 
  group_by(EmploymentStatus) %>% 
  summarize_all(funs(mean(., na.rm = TRUE))) %>%
  gather(Measure, Value, -EmploymentStatus)

ggplot(plot_data, aes(x = EmploymentStatus, y=Value)) +
  geom_bar(stat = "identity") +
  labs(title = "Profit by EmploymentStatus") +
  facet_grid(~Measure, scales="free") +
  coord_flip() + 
  theme(axis.text.x = element_text(angle = 90, hjust = 1))
## Warning: Removed 8 rows containing missing values (position_stack).

What it looks like here is that Prosper ratings are highest for those employed, and employed full-time (it’s not clear what the difference is), lower for those who are self-employed, retired, work part-time, or ‘other,’ and much lower for those not employed. LenderYield, EstimatedEffectiveYield, and EstimatedReturn, however, are highest for those not employed, likely reflecting the higher anticipated interest charged to people in that group. Estimated Loss, correspondingly, is also highest for those not employed - there’s higher potential profit if the loans are paid back, but also significantly more risk.

EmploymentStatusDuration

subset <- c("EstimatedReturn", "EstimatedEffectiveYield", "EstimatedLoss", "LenderYield", "ProsperRating.num","EmploymentStatusDuration")

plot_data <- data[subset] %>% 
  group_by(EmploymentStatusDuration) %>% 
  summarize_all(funs(mean(., na.rm = TRUE))) %>%
  gather(Measure, Value, -EmploymentStatusDuration)

ggplot(plot_data, aes(x = EmploymentStatusDuration, y=Value)) +
  geom_point() + 
  stat_summary(fun.data = mean_cl_normal) + 
  geom_smooth(formula = y~x) +
  labs(title = "Profit by EmploymentStatusDuration") +
  facet_grid(~Measure, scales="free") +
  coord_flip() + 
  theme(axis.text.x = element_text(angle = 90, hjust = 1))
## Warning: Removed 13 rows containing non-finite values (stat_summary).
## `geom_smooth()` using method = 'loess'
## Warning: Removed 13 rows containing non-finite values (stat_smooth).
## Warning: Removed 13 rows containing missing values (geom_point).
## Warning: Removed 3017 rows containing missing values (geom_pointrange).

IsBorrowerHomeowner

subset <- c("EstimatedReturn", "EstimatedEffectiveYield", "EstimatedLoss", "LenderYield", "ProsperRating.num","IsBorrowerHomeowner")

plot_data <- data[subset] %>% 
  group_by(IsBorrowerHomeowner) %>% 
  summarize_all(funs(mean(., na.rm = TRUE))) %>%
  gather(Measure, Value, -IsBorrowerHomeowner)

ggplot(plot_data, aes(x = IsBorrowerHomeowner, y=Value)) +
  geom_bar(stat = "identity") +
  labs(title = "Profit by IsBorrowerHomeowner") +
  facet_grid(~Measure, scales="free") +
  coord_flip() + 
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

FirstRecordedCreditLine

subset <- c("EstimatedReturn", "EstimatedEffectiveYield", "EstimatedLoss", "LenderYield", "ProsperRating.num","FirstRecordedCreditLine")

plot_data <- data[subset] %>% 
  group_by(FirstRecordedCreditLine) %>% 
  summarize_all(funs(mean(., na.rm = TRUE))) %>%
  gather(Measure, Value, -FirstRecordedCreditLine)

ggplot(plot_data, aes(x = FirstRecordedCreditLine, y=Value)) +
  geom_point() + 
  stat_summary(fun.data = mean_cl_normal) + 
  geom_smooth(formula = y~x) +
  labs(title = "Profit by FirstRecordedCreditLine") +
  facet_grid(~Measure, scales="free") +
  coord_flip() + 
  theme(axis.text.x = element_text(angle = 90, hjust = 1))
## Warning: Removed 1961 rows containing non-finite values (stat_summary).
## `geom_smooth()` using method = 'gam'
## Warning: Removed 1961 rows containing non-finite values (stat_smooth).
## Warning: Removed 1961 rows containing missing values (geom_point).
## Warning: Removed 55969 rows containing missing values (geom_pointrange).

OpenRevolvingAccounts

subset <- c("EstimatedReturn", "EstimatedEffectiveYield", "EstimatedLoss", "LenderYield", "ProsperRating.num","OpenRevolvingAccounts")

plot_data <- data[subset] %>% 
  group_by(OpenRevolvingAccounts) %>% 
  summarize_all(funs(mean(., na.rm = TRUE))) %>%
  gather(Measure, Value, -OpenRevolvingAccounts)

ggplot(plot_data, aes(x = OpenRevolvingAccounts, y=Value)) +
  geom_point() + 
  stat_summary(fun.data = mean_cl_normal) + 
  geom_smooth(formula = y~x) +
  labs(title = "Profit by OpenRevolvingAccounts") +
  facet_grid(~Measure, scales="free") +
  coord_flip() + 
  theme(axis.text.x = element_text(angle = 90, hjust = 1))
## Warning: Removed 8 rows containing non-finite values (stat_summary).
## `geom_smooth()` using method = 'loess'
## Warning: Removed 8 rows containing non-finite values (stat_smooth).
## Warning: Removed 8 rows containing missing values (geom_point).
## Warning: Removed 232 rows containing missing values (geom_pointrange).

InquiriesLast6Months

subset <- c("EstimatedReturn", "EstimatedEffectiveYield", "EstimatedLoss", "LenderYield", "ProsperRating.num","InquiriesLast6Months")

plot_data <- data[subset] %>% 
  group_by(InquiriesLast6Months) %>% 
  summarize_all(funs(mean(., na.rm = TRUE))) %>%
  gather(Measure, Value, -InquiriesLast6Months)

ggplot(plot_data, aes(x = InquiriesLast6Months, y=Value)) +
  geom_point() + 
  stat_summary(fun.data = mean_cl_normal) + 
  geom_smooth(formula = y~x) +
  labs(title = "Profit by InquiriesLast6Months") +
  facet_grid(~Measure, scales="free") +
  coord_flip() + 
  theme(axis.text.x = element_text(angle = 90, hjust = 1))
## Warning: Removed 113 rows containing non-finite values (stat_summary).
## `geom_smooth()` using method = 'loess'
## Warning: Removed 113 rows containing non-finite values (stat_smooth).
## Warning: Removed 113 rows containing missing values (geom_point).
## Warning: Removed 142 rows containing missing values (geom_pointrange).

RevolvingCreditBalance

subset <- c("EstimatedReturn", "EstimatedEffectiveYield", "EstimatedLoss", "LenderYield", "ProsperRating.num","RevolvingCreditBalance")

plot_data <- data[subset] %>% 
  group_by(RevolvingCreditBalance) %>% 
  summarize_all(funs(mean(., na.rm = TRUE))) %>%
  gather(Measure, Value, -RevolvingCreditBalance)

ggplot(plot_data, aes(x = RevolvingCreditBalance, y=Value)) +
  geom_point() + 
  stat_summary(fun.data = mean_cl_normal) + 
  geom_smooth(formula = y~x) +
  labs(title = "Profit by RevolvingCreditBalance") +
  facet_grid(~Measure, scales="free") +
  coord_flip() + 
  theme(axis.text.x = element_text(angle = 90, hjust = 1))
## Warning: Removed 13509 rows containing non-finite values (stat_summary).
## `geom_smooth()` using method = 'gam'
## Warning: Removed 13509 rows containing non-finite values (stat_smooth).
## Warning: Removed 13509 rows containing missing values (geom_point).
## Warning: Removed 179271 rows containing missing values (geom_pointrange).

BankcardUtilization

subset <- c("EstimatedReturn", "EstimatedEffectiveYield", "EstimatedLoss", "LenderYield", "ProsperRating.num","BankcardUtilization")

plot_data <- data[subset] %>% 
  group_by(BankcardUtilization) %>% 
  summarize_all(funs(mean(., na.rm = TRUE))) %>%
  gather(Measure, Value, -BankcardUtilization)

ggplot(plot_data, aes(x = BankcardUtilization, y=Value)) +
  geom_point() + 
  stat_summary(fun.data = mean_cl_normal) + 
  geom_smooth(formula = y~x) +
  labs(title = "Profit by BankcardUtilization") +
  facet_grid(~Measure, scales="free") +
  coord_flip() + 
  theme(axis.text.x = element_text(angle = 90, hjust = 1))
## Warning: Removed 237 rows containing non-finite values (stat_summary).
## `geom_smooth()` using method = 'loess'
## Warning: Removed 237 rows containing non-finite values (stat_smooth).
## Warning: Removed 237 rows containing missing values (geom_point).
## Warning: Removed 773 rows containing missing values (geom_pointrange).

TotalTrades

subset <- c("EstimatedReturn", "EstimatedEffectiveYield", "EstimatedLoss", "LenderYield", "ProsperRating.num","TotalTrades")

plot_data <- data[subset] %>% 
  group_by(TotalTrades) %>% 
  summarize_all(funs(mean(., na.rm = TRUE))) %>%
  gather(Measure, Value, -TotalTrades)

ggplot(plot_data, aes(x = TotalTrades, y=Value)) +
  geom_point() + 
  stat_summary(fun.data = mean_cl_normal) + 
  geom_smooth(formula = y~x) +
  labs(title = "Profit by TotalTrades") +
  facet_grid(~Measure, scales="free") +
  coord_flip() + 
  theme(axis.text.x = element_text(angle = 90, hjust = 1))
## Warning: Removed 21 rows containing non-finite values (stat_summary).
## `geom_smooth()` using method = 'loess'
## Warning: Removed 21 rows containing non-finite values (stat_smooth).
## Warning: Removed 21 rows containing missing values (geom_point).
## Warning: Removed 524 rows containing missing values (geom_pointrange).

DebtToIncomeRatio

subset <- c("EstimatedReturn", "EstimatedEffectiveYield", "EstimatedLoss", "LenderYield", "ProsperRating.num","DebtToIncomeRatio")

plot_data <- data[subset] %>% 
  group_by(DebtToIncomeRatio) %>% 
  summarize_all(funs(mean(., na.rm = TRUE))) %>%
  gather(Measure, Value, -DebtToIncomeRatio)

ggplot(plot_data, aes(x = DebtToIncomeRatio, y=Value)) +
  geom_point() + 
  stat_summary(fun.data = mean_cl_normal) + 
  geom_smooth(formula = y~x) +
  labs(title = "Profit by DebtToIncomeRatio") +
  facet_grid(~Measure, scales="free") +
  coord_flip() + 
  theme(axis.text.x = element_text(angle = 90, hjust = 1))
## Warning: Removed 3797 rows containing non-finite values (stat_summary).
## `geom_smooth()` using method = 'gam'
## Warning: Removed 3797 rows containing non-finite values (stat_smooth).
## Warning: Removed 3797 rows containing missing values (geom_point).
## Warning: Removed 2243 rows containing missing values (geom_pointrange).

IncomeVerifiable

subset <- c("EstimatedReturn", "EstimatedEffectiveYield", "EstimatedLoss", "LenderYield", "ProsperRating.num","IncomeVerifiable")

plot_data <- data[subset] %>% 
  group_by(IncomeVerifiable) %>% 
  summarize_all(funs(mean(., na.rm = TRUE))) %>%
  gather(Measure, Value, -IncomeVerifiable)

ggplot(plot_data, aes(x = IncomeVerifiable, y=Value)) +
  geom_bar(stat = "identity") +
  labs(title = "Profit by IncomeVerifiable") +
  facet_grid(~Measure, scales="free") +
  coord_flip() + 
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

TotalProsperLoans

subset <- c("EstimatedReturn", "EstimatedEffectiveYield", "EstimatedLoss", "LenderYield", "ProsperRating.num","TotalProsperLoans")

plot_data <- data[subset] %>% 
  group_by(TotalProsperLoans) %>% 
  summarize_all(funs(mean(., na.rm = TRUE))) %>%
  gather(Measure, Value, -TotalProsperLoans)

ggplot(plot_data, aes(x = TotalProsperLoans, y=Value)) +
  geom_bar(stat = "identity") +
  labs(title = "Profit by TotalProsperLoans") +
  facet_grid(~Measure, scales="free") +
  coord_flip() + 
  theme(axis.text.x = element_text(angle = 90, hjust = 1))
## Warning: Removed 5 rows containing missing values (position_stack).

LoanOriginationQuarter

subset <- c("EstimatedReturn", "EstimatedEffectiveYield", "EstimatedLoss", "LenderYield", "ProsperRating.num","LoanOriginationQuarter")

plot_data <- data[subset] %>% 
  group_by(LoanOriginationQuarter) %>% 
  summarize_all(funs(mean(., na.rm = TRUE))) %>%
  gather(Measure, Value, -LoanOriginationQuarter)

ggplot(plot_data, aes(x = LoanOriginationQuarter, y=Value)) +
  geom_bar(stat = "identity") +
  labs(title = "Profit by LoanOriginationQuarter") +
  facet_grid(~Measure, scales="free") +
  coord_flip() + 
  theme(axis.text.x = element_text(angle = 90, hjust = 1))
## Warning: Removed 56 rows containing missing values (position_stack).

Ultimately, without knowing how the loans ultimately panned out, it is a bit difficult to use this data to make future predictions.

Demographics vs. Delinquency

First, I want to see which Prosper ratings I see delinquency with.

Occupation

subset <- c("AmountDelinquent", "CurrentDelinquencies", "OnTimeProsperPayments", "LoanCurrentDaysDelinquent","Occupation")

plot_data <- data[subset] %>% 
  group_by(Occupation) %>% 
  summarize_all(funs(mean(., na.rm = TRUE))) %>%
  mutate(Occupation = reorder(Occupation, OnTimeProsperPayments, mean)) %>%
  gather(Measure, Value, -Occupation)

ggplot(plot_data, aes(x = Occupation, y=Value)) +
  geom_bar(stat = "identity") +
  labs(title = "Delinquency by Occupation") +
  facet_grid(~Measure, scales="free") +
  coord_flip() + 
  theme(axis.text.x = element_text(angle = 90, hjust = 1))
## Warning: Removed 1 rows containing missing values (position_stack).

There are too many occupations to make easy generalizations. Occupations would likely need to be grouped into a smaller number of categories. However, one can observe general trends - those with higher-paying occupations (or occupational prospects) seem to be more profitable customers. On the other hand, students in general are among the bank’s least profitable customers. This suggests that income, which is grouped in a more sensible manner, may be useful to look at.

IncomeRange

subset <- c("AmountDelinquent", "CurrentDelinquencies", "OnTimeProsperPayments", "LoanCurrentDaysDelinquent","IncomeRange")

plot_data <- data[subset] %>% 
  group_by(IncomeRange) %>% 
  summarize_all(funs(mean(., na.rm = TRUE))) %>%
  gather(Measure, Value, -IncomeRange)

ggplot(plot_data, aes(x = IncomeRange, y=Value)) +
  geom_bar(stat = "identity") +
  labs(title = "Delinquency by IncomeRange") +
  facet_grid(~Measure, scales="free") +
  coord_flip() + 
  theme(axis.text.x = element_text(angle = 90, hjust = 1))
## Warning: Removed 1 rows containing missing values (position_stack).

Here, it can be seen that as income rises, the ProsperRating increases, and other measures of profit decrease. As we have seen, ProsperRating correlates with credit score and likelihood of not defaulting. This suggests that high-income lenders are lower-risk, but lower-income lenders, while being higher-risk, can also yield more profit.

EmploymentStatus

subset <- c("AmountDelinquent", "CurrentDelinquencies", "OnTimeProsperPayments", "LoanCurrentDaysDelinquent","EmploymentStatus")

plot_data <- data[subset] %>% 
  group_by(EmploymentStatus) %>% 
  summarize_all(funs(mean(., na.rm = TRUE))) %>%
  gather(Measure, Value, -EmploymentStatus)

ggplot(plot_data, aes(x = EmploymentStatus, y=Value)) +
  geom_bar(stat = "identity") +
  labs(title = "Delinquency by EmploymentStatus") +
  facet_grid(~Measure, scales="free") +
  coord_flip() + 
  theme(axis.text.x = element_text(angle = 90, hjust = 1))
## Warning: Removed 4 rows containing missing values (position_stack).

What it looks like here is that Prosper ratings are highest for those employed, and employed full-time (it’s not clear what the difference is), lower for those who are self-employed, retired, work part-time, or ‘other,’ and much lower for those not employed. LenderYield, EstimatedEffectiveYield, and EstimatedReturn, however, are highest for those not employed, likely reflecting the higher anticipated interest charged to people in that group. Estimated Loss, correspondingly, is also highest for those not employed - there’s higher potential profit if the loans are paid back, but also significantly more risk.

EmploymentStatusDuration

subset <- c("AmountDelinquent", "CurrentDelinquencies", "OnTimeProsperPayments", "LoanCurrentDaysDelinquent","EmploymentStatusDuration")

plot_data <- data[subset] %>% 
  group_by(EmploymentStatusDuration) %>% 
  summarize_all(funs(mean(., na.rm = TRUE))) %>%
  gather(Measure, Value, -EmploymentStatusDuration)

ggplot(plot_data, aes(x = EmploymentStatusDuration, y=Value)) +
  geom_point() + 
  stat_summary(fun.data = mean_cl_normal) + 
  geom_smooth(formula = y~x) +
  labs(title = "Delinquency by EmploymentStatusDuration") +
  facet_grid(~Measure, scales="free") +
  coord_flip() + 
  theme(axis.text.x = element_text(angle = 90, hjust = 1))
## Warning: Removed 102 rows containing non-finite values (stat_summary).
## `geom_smooth()` using method = 'loess'
## Warning: Removed 102 rows containing non-finite values (stat_smooth).
## Warning: Removed 102 rows containing missing values (geom_point).
## Warning: Removed 2322 rows containing missing values (geom_pointrange).

IsBorrowerHomeowner

subset <- c("AmountDelinquent", "CurrentDelinquencies", "OnTimeProsperPayments", "LoanCurrentDaysDelinquent","IsBorrowerHomeowner")

plot_data <- data[subset] %>% 
  group_by(IsBorrowerHomeowner) %>% 
  summarize_all(funs(mean(., na.rm = TRUE))) %>%
  gather(Measure, Value, -IsBorrowerHomeowner)

ggplot(plot_data, aes(x = IsBorrowerHomeowner, y=Value)) +
  geom_bar(stat = "identity") +
  labs(title = "Delinquency by IsBorrowerHomeowner") +
  facet_grid(~Measure, scales="free") +
  coord_flip() + 
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

FirstRecordedCreditLine

subset <- c("AmountDelinquent", "CurrentDelinquencies", "OnTimeProsperPayments", "LoanCurrentDaysDelinquent","FirstRecordedCreditLine")

plot_data <- data[subset] %>% 
  group_by(FirstRecordedCreditLine) %>% 
  summarize_all(funs(mean(., na.rm = TRUE))) %>%
  gather(Measure, Value, -FirstRecordedCreditLine)

ggplot(plot_data, aes(x = FirstRecordedCreditLine, y=Value)) +
  geom_point() + 
  stat_summary(fun.data = mean_cl_normal) + 
  geom_smooth(formula = y~x) +
  labs(title = "Delinquency by FirstRecordedCreditLine") +
  facet_grid(~Measure, scales="free") +
  coord_flip() + 
  theme(axis.text.x = element_text(angle = 90, hjust = 1))
## Warning: Removed 5117 rows containing non-finite values (stat_summary).
## `geom_smooth()` using method = 'gam'
## Warning: Removed 5117 rows containing non-finite values (stat_smooth).
## Warning: Removed 5117 rows containing missing values (geom_point).
## Warning: Removed 41227 rows containing missing values (geom_pointrange).

OpenRevolvingAccounts

subset <- c("AmountDelinquent", "CurrentDelinquencies", "OnTimeProsperPayments", "LoanCurrentDaysDelinquent","OpenRevolvingAccounts")

plot_data <- data[subset] %>% 
  group_by(OpenRevolvingAccounts) %>% 
  summarize_all(funs(mean(., na.rm = TRUE))) %>%
  gather(Measure, Value, -OpenRevolvingAccounts)

ggplot(plot_data, aes(x = OpenRevolvingAccounts, y=Value)) +
  geom_point() + 
  stat_summary(fun.data = mean_cl_normal) + 
  geom_smooth(formula = y~x) +
  labs(title = "Delinquency by OpenRevolvingAccounts") +
  facet_grid(~Measure, scales="free") +
  coord_flip() + 
  theme(axis.text.x = element_text(angle = 90, hjust = 1))
## Warning: Removed 8 rows containing non-finite values (stat_summary).
## `geom_smooth()` using method = 'loess'
## Warning: Removed 8 rows containing non-finite values (stat_smooth).
## Warning: Removed 8 rows containing missing values (geom_point).
## Warning: Removed 184 rows containing missing values (geom_pointrange).

InquiriesLast6Months

subset <- c("AmountDelinquent", "CurrentDelinquencies", "OnTimeProsperPayments", "LoanCurrentDaysDelinquent","InquiriesLast6Months")

plot_data <- data[subset] %>% 
  group_by(InquiriesLast6Months) %>% 
  summarize_all(funs(mean(., na.rm = TRUE))) %>%
  gather(Measure, Value, -InquiriesLast6Months)

ggplot(plot_data, aes(x = InquiriesLast6Months, y=Value)) +
  geom_point() + 
  stat_summary(fun.data = mean_cl_normal) + 
  geom_smooth(formula = y~x) +
  labs(title = "Delinquency by InquiriesLast6Months") +
  facet_grid(~Measure, scales="free") +
  coord_flip() + 
  theme(axis.text.x = element_text(angle = 90, hjust = 1))
## Warning: Removed 30 rows containing non-finite values (stat_summary).
## `geom_smooth()` using method = 'loess'
## Warning: Removed 30 rows containing non-finite values (stat_smooth).
## Warning: Removed 30 rows containing missing values (geom_point).
## Warning: Removed 174 rows containing missing values (geom_pointrange).

RevolvingCreditBalance

subset <- c("AmountDelinquent", "CurrentDelinquencies", "OnTimeProsperPayments", "LoanCurrentDaysDelinquent","RevolvingCreditBalance")

plot_data <- data[subset] %>% 
  group_by(RevolvingCreditBalance) %>% 
  summarize_all(funs(mean(., na.rm = TRUE))) %>%
  gather(Measure, Value, -RevolvingCreditBalance)

ggplot(plot_data, aes(x = RevolvingCreditBalance, y=Value)) +
  geom_point() + 
  stat_summary(fun.data = mean_cl_normal) + 
  geom_smooth(formula = y~x) +
  labs(title = "Delinquency by RevolvingCreditBalance") +
  facet_grid(~Measure, scales="free") +
  coord_flip() + 
  theme(axis.text.x = element_text(angle = 90, hjust = 1))
## Warning: Removed 23589 rows containing non-finite values (stat_summary).
## `geom_smooth()` using method = 'gam'
## Warning: Removed 23589 rows containing non-finite values (stat_smooth).
## Warning: Removed 23589 rows containing missing values (geom_point).
## Warning: Removed 130635 rows containing missing values (geom_pointrange).

BankcardUtilization

subset <- c("AmountDelinquent", "CurrentDelinquencies", "OnTimeProsperPayments", "LoanCurrentDaysDelinquent","BankcardUtilization")

plot_data <- data[subset] %>% 
  group_by(BankcardUtilization) %>% 
  summarize_all(funs(mean(., na.rm = TRUE))) %>%
  gather(Measure, Value, -BankcardUtilization)

ggplot(plot_data, aes(x = BankcardUtilization, y=Value)) +
  geom_point() + 
  stat_summary(fun.data = mean_cl_normal) + 
  geom_smooth(formula = y~x) +
  labs(title = "Delinquency by BankcardUtilization") +
  facet_grid(~Measure, scales="free") +
  coord_flip() + 
  theme(axis.text.x = element_text(angle = 90, hjust = 1))
## Warning: Removed 55 rows containing non-finite values (stat_summary).
## `geom_smooth()` using method = 'loess'
## Warning: Removed 55 rows containing non-finite values (stat_smooth).
## Warning: Removed 55 rows containing missing values (geom_point).
## Warning: Removed 753 rows containing missing values (geom_pointrange).

TotalTrades

subset <- c("AmountDelinquent", "CurrentDelinquencies", "OnTimeProsperPayments", "LoanCurrentDaysDelinquent","TotalTrades")

plot_data <- data[subset] %>% 
  group_by(TotalTrades) %>% 
  summarize_all(funs(mean(., na.rm = TRUE))) %>%
  gather(Measure, Value, -TotalTrades)

ggplot(plot_data, aes(x = TotalTrades, y=Value)) +
  geom_point() + 
  stat_summary(fun.data = mean_cl_normal) + 
  geom_smooth(formula = y~x) +
  labs(title = "Delinquency by TotalTrades") +
  facet_grid(~Measure, scales="free") +
  coord_flip() + 
  theme(axis.text.x = element_text(angle = 90, hjust = 1))
## Warning: Removed 12 rows containing non-finite values (stat_summary).
## `geom_smooth()` using method = 'loess'
## Warning: Removed 12 rows containing non-finite values (stat_smooth).
## Warning: Removed 12 rows containing missing values (geom_point).
## Warning: Removed 424 rows containing missing values (geom_pointrange).

DebtToIncomeRatio

subset <- c("AmountDelinquent", "CurrentDelinquencies", "OnTimeProsperPayments", "LoanCurrentDaysDelinquent","DebtToIncomeRatio")

plot_data <- data[subset] %>% 
  group_by(DebtToIncomeRatio) %>% 
  summarize_all(funs(mean(., na.rm = TRUE))) %>%
  gather(Measure, Value, -DebtToIncomeRatio)

ggplot(plot_data, aes(x = DebtToIncomeRatio, y=Value)) +
  geom_point() + 
  stat_summary(fun.data = mean_cl_normal) + 
  geom_smooth(formula = y~x) +
  labs(title = "Delinquency by DebtToIncomeRatio") +
  facet_grid(~Measure, scales="free") +
  coord_flip() + 
  theme(axis.text.x = element_text(angle = 90, hjust = 1))
## Warning: Removed 2506 rows containing non-finite values (stat_summary).
## `geom_smooth()` using method = 'gam'
## Warning: Removed 2506 rows containing non-finite values (stat_smooth).
## Warning: Removed 2506 rows containing missing values (geom_point).
## Warning: Removed 2326 rows containing missing values (geom_pointrange).

IncomeVerifiable

subset <- c("AmountDelinquent", "CurrentDelinquencies", "OnTimeProsperPayments", "LoanCurrentDaysDelinquent","IncomeVerifiable")

plot_data <- data[subset] %>% 
  group_by(IncomeVerifiable) %>% 
  summarize_all(funs(mean(., na.rm = TRUE))) %>%
  gather(Measure, Value, -IncomeVerifiable)

ggplot(plot_data, aes(x = IncomeVerifiable, y=Value)) +
  geom_bar(stat = "identity") +
  labs(title = "Delinquency by IncomeVerifiable") +
  facet_grid(~Measure, scales="free") +
  coord_flip() + 
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

TotalProsperLoans

subset <- c("AmountDelinquent", "CurrentDelinquencies", "OnTimeProsperPayments", "LoanCurrentDaysDelinquent","TotalProsperLoans")

plot_data <- data[subset] %>% 
  group_by(TotalProsperLoans) %>% 
  summarize_all(funs(mean(., na.rm = TRUE))) %>%
  gather(Measure, Value, -TotalProsperLoans)

ggplot(plot_data, aes(x = TotalProsperLoans, y=Value)) +
  geom_bar(stat = "identity") +
  labs(title = "Delinquency by TotalProsperLoans") +
  facet_grid(~Measure, scales="free") +
  coord_flip() + 
  theme(axis.text.x = element_text(angle = 90, hjust = 1))
## Warning: Removed 4 rows containing missing values (position_stack).

LoanOriginationQuarter

subset <- c("AmountDelinquent", "CurrentDelinquencies", "OnTimeProsperPayments", "LoanCurrentDaysDelinquent","LoanOriginationQuarter")

plot_data <- data[subset] %>% 
  group_by(LoanOriginationQuarter) %>% 
  summarize_all(funs(mean(., na.rm = TRUE))) %>%
  gather(Measure, Value, -LoanOriginationQuarter)

ggplot(plot_data, aes(x = LoanOriginationQuarter, y=Value)) +
  geom_bar(stat = "identity") +
  labs(title = "Delinquency by LoanOriginationQuarter") +
  facet_grid(~Measure, scales="free") +
  coord_flip() + 
  theme(axis.text.x = element_text(angle = 90, hjust = 1))
## Warning: Removed 14 rows containing missing values (position_stack).

Ultimately, without knowing how the loans ultimately panned out, it is a bit difficult to use this data to make future predictions.

Final Plots and Summary

What is notable above is that in each graph where there is a noticeable relationship between profit predictors and profit measures, the Prosper rating is inversely correlated with the profit measures. What is also notable is that there is a consistent relationship between lender yield, and lender loss: the more the lender stands to gain, the more they stand to lose. I look at this in more detail below. What is also notable is that estimated effective yield is always a bit less than both the estimated yield, reflecting also the estimated loss.

Assuming that the various profit measures, which may reflect only profit for clients/lenders, rather than for the company itself, are in fact what we want to be looking at, it is possible to notice certain trends which may be worth looking at more closely.

Further, assuming that lenders also care about potential missed payments, particularly if this would put them in a financial bind, it is worth looking at strong demographic predictors of delinquency, which does not appear to be reflected in the Prosper rating.

Lender yield by estimated loss

Lender profit by number of open revolving accounts

Lender profit by loan origination quarter

Reflection

First, I encountered a fair bit of trouble interpreting the data without any background story. Googling around for info on Prosper loans online, I was able to get a general idea of what the company was doing, which made interpreting the data somewhat easier.

At this point, it is still difficult to say much about this data without knowing, in a lot more detail: what realities the less obvious measures reflect; the story behind the data; how certain measures are gathered and determined; and how the various measures reflect on both profit for the company, and profit for the clients. What would be needed is to take a much closer and more in-depth look at what the company does, what purpose the data serves, and how the measures were collected and what they reflect.